A regression line allows you to use one variable's values to predict the values of another. In this video, learn how to use Excel to calculate a regression line's properties and make predictions.
- [Instructor] We turn now to an extremely valuable statistical tool, the regression line. Let's see why it's so valuable. The idea is to use data from one or more measures to predict data for one or more other measures. the independent variable, to predict data for another, you're trying to predict. Here's an example. A company creates an aptitude test to predict job performance. They test 20 current employees for whom they have job performance data. The goal is to test applicants and predict their performance as part of the hiring decision. A good thing to do is visualize the data, does just that. The aptitude score, the independent variable, by the employee's aptitude score, the x-distance, Here's a very important point. If we have no information about each employee for any applicant is the average performance of the current employees. That's all we have to go on. so that average is y-bar, and for this data set, y-bar equals 31.9. What we need is a way to summarize the relationship between aptitude score and performance of performance based on aptitude, and that's the job of the regression line. The regression line is the line of best fit through a scatterplot. It summarizes the relationship between the independent variable, the predictor variable, and the dependent variable, the predicted variable. Now, an infinite number of lines is possible and that's the regression line. It summarizes the relationship in the scatterplot similar to the way a mean summarizes a set of scores. The regression line minimizes the sum of squared distances in the y-direction from the points Any point on the line represents two values, a value of the independent variable, x, Notice the little symbol over the y. That lets you know that we're talking You'd read that as y hat. Like any line in a graph, a regression line has a slope, how slanted it is, and a regression line has an intercept, An important part of regression analysis is to use the data to determine the slope and the intercept. based on its slope and its intercept. In words, the equation is, predicted y equals intercept plus the slope times x. In mathematical form, it's y hat equals a plus bx. Once a and b are known, you supply a value for x. The equation returns a predicted value, y hat. Here's how to find the slope and the intercept. By the way, they're called regression coefficients. For the slope, you use this formula. For each individual, you subtract the mean of x from their x-score, and the mean of y from their y-score, and multiply the results. Add up all those products, and divide by the sum of squared deviations from the mean of x. Once you've found the slope, you can find the intercept. It's the mean of the y-scores minus the slope times the mean of the x-scores. Now we'll use Excel to find the slope and the intercept of a regression line. Here's the data for the aptitude performance example. Column D shows the employee's name, column E shows their aptitude score, We'll put the slope in cell J8 and the intercept in cell J10. and type equals slope, open paren, and for the known y's, select the data in column F, beginning with cell F2, comma. For the known x's, select the data in column E, beginning with cell E2, close paren, and Enter, and that's the slope. Now click in cell J10 and type equals intercept, open paren. For the known y's, select the data in column F, beginning with cell F2. beginning with cell E2, close paren, Enter, and there's the intercept. Now that we've calculated the slope and the intercept, we can use the regression line to make predictions. Here's an example. A job applicant takes the aptitude test. Her aptitude score is 40. Plugging 40 into the equation, we arrive at 38.04 for the prediction of her performance. Without the relationship between aptitude and performance and without the regression equation, the best prediction would've been y-bar, 31.9. So, knowledge of the relationship between aptitude score and performance enables a more precise estimate for an applicant. Knowledge is power.
- Explain how to calculate simple probability.
- Review the Excel statistical formulas for finding mean, median, and mode.
- Differentiate statistical nomenclature when calculating variance.
- Identify components when graphing frequency polygons.
- Explain how t-distributions operate.
- Describe the process of determining a chi-square.