class: center, middle, inverse, title-slide .title[ # Excel Lab ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Excel Lab ## Z-Scores, Correlation, and Regression with the Iris Dataset --- # Lab Overview and Dataset Features .pull-left[ ## Overview - In this lab, you will use Excel to explore **Z-scores**, **correlation**, and **regression analysis** with the famous Iris dataset. - The Iris dataset, introduced by statistician Ronald A. Fisher in 1936, is a classic dataset in statistics and machine learning. ] -- .pull-right[ ## Key Features: - 150 observations of iris flowers. - It includes four numeric features: - Sepal.Length - Sepal.Width - Petal.Length - Petal.Width - The flowers belong to three species: - Setosa - Versicolor - Virginica ] --- class: middle, inverse # Task 1 ## Working with Z-Scores in Excel --- # Task 1: Overview .pull-left[ In this task, we will: - Work with the **Sepal.Length** column to calculate: - Z-scores - Percentiles - Inverse normal values (`NORM.INV`) ] .pull-right[ <img src="data:image/png;base64,#../img/DALLE_exceldoodle.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Step 0: Download the Iris Dataset .pull-left-wide[ ```r # Save the dataset as an Excel file write.xlsx(iris, "IrisData.xlsx") ``` [Download the Excel file here](https://github.com/psychmethods/slides/raw/refs/heads/main/t08_lab/IrisData.xlsx) ] .pull-right-narrow[ .center[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> ]] .footnote[or here https://github.com/psychmethods/slides/raw/refs/heads/main/t08_lab/IrisData.xlsx] --- # Step 1: Calculate the Mean .pull-left[ 1. Open the Excel file `IrisData.xlsx`. 2. Select the cell where you want to calculate the mean (e.g., cell B152). 3. Click the cell and check the Formula Bar at the top of the screen. 4. In the Formula Bar, enter the formula to calculate the mean: - `=AVERAGE(A2:A151)` 5. Press Enter to execute the formula, and the mean value will appear in the selected cell. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> **Explanation**: - **Formula Bar**: This is where you can enter or edit formulas for the selected cell. - **Formula Entry**: You can type any valid Excel formula here. In this case, `=AVERAGE()` calculates the sample mean. ] --- # Step 2: Calculate the Standard Deviation .pull-left[ 1. **Select a new cell** (e.g., B153) to store the standard deviation result. - Click on the cell and use the **Formula Bar** to enter the formula. 2. **In the Formula Bar**, type the following formula to calculate the standard deviation of `Sepal.Length`: - `=STDEV.S(A2:A151)` 3. Press **Enter**, and Excel will compute the standard deviation in the selected cell. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Step 3: Calculate Z-Scores .pull-left[ 1. **Select cell F2** (next to the 1st value in the `Sepal.Length` column) to enter the formula. 2. In the **Formula Bar**, enter the Z-score formula: - `=(A2 - [Mean Cell]) / [Standard Deviation Cell]` - Replace `[Mean Cell]` with the cell where you calculated the mean (e.g., `$F$152`) and `[Standard Deviation Cell]` with the cell containing the standard deviation (e.g., `$F$153`). 3. Press **Enter** to get the Z-score for the 1st value. 4. **Fill down the formula**: Click on the small square at the bottom-right corner of the cell (called the **fill handle**) and drag down to copy the formula for the rest of the column. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" /> .medi[ **Explanation**: - **Fill Handle**: A small square at the bottom-right corner of the selected cell that allows you to drag and apply the same formula to other cells. - **Relative References**: The formula updates the cell references automatically when dragging down. Use `$` signs (e.g., `$F$152`) to fix a cell reference. ] ] --- # Step 4: Calculate Percentiles .pull-left[ 1. **Select cell G2** to calculate the cumulative percentile for the first Z-score. 2. **In the Formula Bar**, type the following formula: - `=NORM.S.DIST(F2, TRUE)` 3. Press **Enter** to calculate the percentile for the Z-score. 4. **Fill down the formula**: Use the **fill handle** to drag down and calculate percentiles for all Z-scores in the column. ] .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-8-1.png" width="60%" style="display: block; margin: auto;" /> **Explanation**: - **NORM.S.DIST()**: calculates the cumulative probability of a Z-score. The `TRUE` argument tells Excel to return the cumulative probability. ] --- # Step 5.0: Use `NORM.INV()` to Find Raw Scores - Now, let's reverse the process: Suppose you want to find the raw score (e.g., `Sepal.Length`) that corresponds to a specific percentile, like the **90th percentile**. - We will use the **`NORM.INV()`** function in Excel to do this. --- # Step 5.1: Use `NORM.INV()` to Find Raw Scores .pull-left[ 1. **Select a new cell** (e.g., B154) where you want to calculate the raw score for the 90th percentile. 2. **In the Formula Bar**, enter the following formula: - `=NORM.INV(0.90, [Mean Cell], [Standard Deviation Cell])` - Replace `[Mean Cell]` and `[Standard Deviation Cell]` with the cell references for the mean (e.g., `$B$152`) and standard deviation (e.g., `$B$153`), respectively. 3. Press **Enter**, and Excel will calculate the raw score corresponding to the 90th percentile. ] .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> .medi[ **Explanation**: - **NORM.INV()** calculates the inverse of the normal cumulative distribution for a given probability. Here, the probability is `0.90`, representing the 90th percentile, and returns the raw score (in this case, a `Sepal.Length` value) that aligns to that percentile. ] ] --- # Task 1 Recap: Working with Z-Scores in Excel In this section, you: - Calculated the **mean** and **standard deviation** of `Sepal.Length`. - Computed **Z-scores** and **percentiles** using Excel. - Used **`NORM.INV()`** to find the raw score corresponding to a given percentile (e.g., the 90th percentile). --- class: center, middle, inverse # Task 2 ## Explore Bivariate Relationships and Correlation --- # Task 2: Overview .pull-left[ - **Visualize the relationship** between two variables using scatter plots. - **Calculate the correlation coefficient** to measure the strength and direction of the linear relationship between two variables. ] --- ## Step 0: Download the Iris Dataset .pull-left[ We'll continue to work with two columns from the **Iris dataset**: - **Sepal.Length** (the length of the sepal). - **Petal.Length** (the length of the petal). [Download the Excel file here](https://github.com/psychmethods/slides/raw/refs/heads/main/t08_lab/IrisData_Bivariate.xlsx) <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-11-1.png" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ .center[ <img src="data:image/png;base64,#../img/DALLE_Fisher_looking_at_flowers.png" width="70%" style="display: block; margin: auto;" /> ]] .footnote[This way you can revisit the lab later if needed.] --- # Step 1: Create a Scatter Plot .pull-left.medi[ 2. **Select the data** for the scatter plot: - Click and drag to highlight both the `Sepal.Length` and `Petal.Length` columns (A1:B151). 3. **Insert a Scatter Plot**: - Go to the **Insert** tab in Excel. - In the **Charts** section, click on **Scatter Plot** and select the option for a **simple scatter plot**. 4. **Format the Scatter Plot**: - Add axis labels: - **X-axis**: Sepal Length - **Y-axis**: Petal Length - Title the chart: "Scatter Plot: Sepal Length vs. Petal Length." ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Step 2: Calculate the Correlation Coefficient .pull-left[ 1. **Select a new cell** (e.g., `B152`) where you want to display the correlation coefficient. 2. **Use the `CORREL()` function** to calculate the correlation: - In the **Formula Bar**, enter the following formula: - `=CORREL(A2:A151, B2:B151)` 3. Press **Enter** to calculate the correlation coefficient between `Sepal.Length` and `Petal.Length`. ] .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-14-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Task 2 Recap: Scatter Plot and Correlation Coefficient - In this section, you: - Created a **scatter plot** to visualize the relationship between Sepal Length and Petal Length in the Iris dataset. - Calculated the **correlation coefficient** to quantify the strength and direction of the linear relationship between these two variables. .reminder[Be sure to save your Excel file with the scatter plot and correlation results for future reference.] --- class: middle, inverse # Task 3 ## Perform Regression Analysis in Excel --- # Task 3: Overview - In this task, we will: - **Fit a linear regression model** to the data. - **Interpret the slope and intercept** of the regression line. - **Use the regression equation** to predict new values based on the relationship between Sepal Length and Petal Length. --- # Step 0: Redownload the Iris Dataset - You will continue to work with the Iris dataset for the regression analysis. - This time, you’ll use `Sepal.Length` as the independent variable (X) and `Petal.Length` as the dependent variable (Y). [Download the Excel file here](https://github.com/psychmethods/slides/raw/refs/heads/main/t08_lab/IrisData_Regression.xlsx) --- # Step 1: Calculate the Slope .pull-left[ 1. **Open the dataset** `IrisData_Regression.xlsx` in Excel. 2. **Select a new cell** (e.g., C152) to calculate the slope. 3. **In the Formula Bar**, enter the following formula to calculate the slope: - `=SLOPE(B2:B151, A2:A151)` This calculates the slope of the regression line, where `Sepal.Length` (A2:A151) is the independent variable and `Petal.Length` (B2:B151) is the dependent variable. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-16-1.png" width="50%" style="display: block; margin: auto;" /> **Explanation**: - The **slope** tells us how much `Petal.Length` is expected to change for each unit increase in `Sepal.Length`. ] --- # Step 2: Calculate the Intercept .pull-left[ 1. **Select a new cell** (e.g., C153) to calculate the intercept. 2. **In the Formula Bar**, enter the following formula: - `=INTERCEPT(B2:B151, A2:A151)` 3. Press **Enter** to calculate the intercept. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-17-1.png" width="50%" style="display: block; margin: auto;" /> **Explanation**: - The **intercept** represents the predicted value of `Petal.Length` when `Sepal.Length` is zero. ] --- # Step 3: Calculate the R-squared Value .pull-left[ 1. **Select a new cell** (e.g., C154) to calculate the R-squared value. 2. **In the Formula Bar**, enter the following formula: - `=RSQ(B2:B151, A2:A151)` 3. Press **Enter** to calculate the R-squared value. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-18-1.png" width="90%" style="display: block; margin: auto;" /> **Explanation**: - The **R-squared value** tells us how well the regression line fits the data. Values closer to 1 indicate more variance explained ] --- # Step 4: Use `LINEST()` for Multiple Outputs .pull-left[ 1. **Select a new cell** (e.g., C155) to calculate multiple regression outputs at once. 2. **In the Formula Bar**, enter the following formula to calculate the slope, intercept, and R-squared together: - `=LINEST(B2:B151, A2:A151, TRUE, TRUE)` 3. Press **Enter**, and Excel will return the slope, intercept, and additional statistical information, including the R-squared value. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" /> **Explanation**: - The **`LINEST()` function** allows you to calculate multiple regression statistics in one step, including slope, intercept, and R-squared. ] --- # Step 5: Apply the Regression Equation to Predict Values .pull-left[ 1. **Use the regression equation** you generated from the slope and intercept to predict values of `Petal.Length` for new values of `Sepal.Length`. 2. **Example**: - If your equation is \( Y = 1.86X + -7.1 \), and you want to predict `Petal.Length` for a `Sepal.Length` of 7, substitute \( X = 7 \) into the equation: \[ Y = 1.86(7) + -7.1 = 13.02 + -7.1 = 5.92 \] - The predicted `Petal.Length` for a `Sepal.Length` of 7 is 5.92. ] -- .pull-right[ <img src="data:image/png;base64,#13_Lab_files/figure-html/unnamed-chunk-21-1.png" width="90%" style="display: block; margin: auto;" /> **Practical Task**: - Try predicting the `Petal.Length` for a `Sepal.Length` of 3.5 using your own regression equation from the Excel output. ] --- # Task 3 Recap: Regression Analysis and Prediction - In this section, you: - Performed linear regression using Excel’s **SLOPE()**, **INTERCEPT()**, **RSQ()**, and optionally **LINEST()** functions. - Interpreted the **slope** and **intercept** of the regression equation. - Used the regression equation to **predict new values** for `Petal.Length`. .reminder[Save your Excel file with the regression output and predictions for future reference.] --- class: center, middle, inverse # Recap of Recaps - In this lab, you explored various statistical concepts using Excel: - **Z-scores**, **percentiles**, and **inverse normal values**. - **Scatter plots** and **correlation coefficients**. - **Linear regression analysis** and **prediction**.