Data Matters with Excel®
Activity 9.2
Section 9.2 proposes that a way to find a line that does a good job estimating y-values from x-values is to find the line that goes through the mean of the y-values at each x-value, if thats possible.
There are seven steps to testing this claim.
- Pick two x-values.
- Have your software select several random y-values for each x-value.
- Calculate the regression line that goes through the means of the y-values of each x-value.
- Use your software to calculate the mistakes the estimate makes.
- Add up all the mistakes the regression line makes.
- Does the line go through the mean-mean point?
- Shift the line by changing the slope and check how that affects the sum of the lines errors.
Heres how to do these steps.
Step 1: Pick two x-values.
Label column A and column B. Call them X and Y. Pick two values for x and type several of each into the cells in column A.
Step 2: Have your software select several random y-values for each x-value.
If you set Excel to select the y-values with its random number generator, =rand(), Excel will repeatedly replace the y-values as you try to work with them. So for this project, it will be fine to simply type in some numbers chosen at random. You dont need a truly random sample here.
Step 3: Calculate the regression line that goes through the means of the y-values of each x-value.
You will need the mean of y for each value of x. To get a mean y-value, select an empty cell in column E, type =average( , click and drag to select the values you need the mean of, type ) and press Enter.
Calculate the equation of a line from the mean of y for one x-value to the mean of the y-values of the other x-value. To do this, you are calculating the equation for a line between two points. The x-values are the x-values of your data. The y-values are the means of the y-values at each x-value. (Hint: You can start by finding the sloperise over run.)
Step 4: Use your software to calculate the mistakes the estimate makes.
Label column C Estimates. Enter your regression equation into C2 and drag it down to fill all the rows. Type an asterisk for multiplication and refer to the x-value as A2. For example, if your regression equation is y=3x-2, then enter =3*A2-2 .
Label column D Error. Enter its formula, =B2-C2 .
Step 5: Add up all the mistakes the regression line makes.
Select the cell below the column of errors. Click on the summation symbol and press Enter.
Step 6: Does the line go through the mean-mean point?
To get the mean x-value and mean y-value, you can put their means underneath each column. Select the cell below the x-values. Type =average( , select the x-values, type ) , and press Enter. You can drag that equation under the y-values to get the mean of the y-values.
Is the mean-mean point a point on the line described by your regression equation?
Step 7: Shift the line by changing the slope and check how that affects the sum of the lines errors.
Go to cell C2. Edit the formula, changing the slope and/or intercept. After each change, drag the equation down to fill all the lines of data. How do your changes affect the sum of the errors?
| ©2008 Key College Publishing. All rights reserved. |
|