Data Matters with Fathom! Dynamic Statistics software
Activity 9.2
Section 9.2 proposes that a way to find a line that does a good job estimating y-values from x-values is to find the line that goes through the mean of the y-values at each x-value, if thats possible.
There are seven steps to testing this claim.
- Pick two x-values.
- Have your software select several random y-values for each x-value.
- Calculate the regression line that goes through the means of the y-values of each x-value.
- Use your software to calculate the mistakes the estimate makes.
- Add up all the mistakes the regression line makes.
- Does the line go through the mean-mean point?
- Shift the line by changing the slope and check how that affects the sum of the lines errors.
Heres how to do these steps.
Step 1: Pick two x-values.
Drag a case table onto the workspace. Add two attributes, x and y. Decide on two x-values that you want to work with. Enter several cases of each x-value.
Step 2: Have your software select several random y-values for each x-value.
Pick your own random y-values. (If you use the random number generators, Fathom will continue replacing them as you try to work with them.)
Step 3: Calculate the regression line that goes through the means of the y-values of each x-value.
Fathom doesnt want to calculate the mean for each value of x as long as x is a numeric variable. To get the means for each value of x, add a new attribute, xCategorical. Fill xCategorical with letters, one letter for each value of x.
Drag a new summary table onto the workspace. Drag xCategorical to the down-arrow and y to the right-arrow. You will have to expand the summary table to see all of it. Click on a corner and drag to make the tables window larger.
Calculate the equation of a line from the mean y-values of one x-value to the mean of the y-values of the other x-value.
Step 4: Use your software to calculate the mistakes the estimate makes.
Add a new attribute, Error, to the case table. Set its formula to be y minus the equation of the regression line that you calculated. For example, if your equation was y = 3 2x, then the formula would be y 3 + 2x, which is the same as y (3 2x).
Step 5: Add up all the mistakes the regression line makes.
The easiest way to do this is to add another attribute, sumErrors. Its formula is sum(error). In what way do errors balance each other in a regression line? What is the average error?
Step 6: Does the line go through the mean-mean point?
To get the mean x-value and mean y-value, drag a New Estimate window onto the workspace, select Estimate Mean, and drag x and y onto the Estimate Window. Is the mean-mean point a point on the regression line?
Step 7: Shift the line by changing the slope and check how that affects the sum of the lines errors.
Right-click on Error and edit its formula by changing the slope of the equation that is subtracted. How does that change the errors?
| ©2008 Key College Publishing. All rights reserved. |
|