Data Matters with SPSS®
Activity 7.3
Section 7.3 suggests two ways to estimate the population variance from data in groups. One way looks at how the group means vary. The other way looks at how the observations within each group vary from the group mean. In this project, you will collect these estimates from many random samples from RepUSSample.sav. You can then explore the estimates and see whether they seem to work well.
To do this, you will aggregate twiceonce to gather the means and sum of squares of each group and a second time to put the groups together and calculate the two variance estimates.
Use the Syntax program from Section 7.2.
INPUT PROGRAM.
LOOP #Sample = 1 TO 1000.
LOOP #Case = 1 to 9.
COMPUTE sample = #Sample.
COMPUTE measure = RV.NORMAL(0,7).
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
|
Replace 1000 on the second line with the number of groups you want. Set 9 on the third line to the number you want in each group. Set 7 to whatever standard deviation you choose for the population. Square your standard deviation to get the population variance.
Next comes the first aggregation. It would be great to save the sum of squared deviations, but SPSS doesnt have that option for aggregation. You will save the standard deviation, then calculate the sum of squared deviations from the standard deviations.
In the data editor, click on Data, Aggregate. Sample is the break variable. Put measure into the Aggregate Variable(s) box. The default is the mean.
To keep things clear, click on Name & Label and replace the name with mean. Click on Continue.
You also need the standard deviation. Put measure into the Aggregate Variable(s) box a second time, click on Function and select Standard Deviation. Click on Continue. Click on Name & Label and replace the name with SD. Click on Continue, Replace working data, OK.
Before proceeding, you need to calculate the sum of squared errors. To get the sum of squared errors, undo the steps to calculate the standard deviation. To get the variance, square the standard deviation. To get the squared deviations, multiply the variance by one less than the number of observations in each group. When you have done that, you have the sum of the squares of the differences between each observation and that groups mean. You will need that to estimate the variance from the variation within each group.
To do the calculations, click on Transform, Compute. Name the new variable SS (for sum of squares.) The numeric expression is (SD**2)*([group size] 1) . In SPSS Syntax, ** means raised to the power of. SD**2 means SD squared.
Replace the part in brackets with your group size.
For the next aggregation, reset the sample variable. Click on Transform, Compute. Enter sample for the variable, and the numeric expression TRUNC(($CASENUM-1)/2) .
If you want to combine three groups, use TRUNC(($CASENUM-1)/3) .
After you make that transformation, look at the data to see what that numeric expression did.
Click on Data, Aggregate. Sample is the break variable. Put mean into the Aggregate Variable(s) box. Click on Function and select Standard Deviation. Click on Continue. To keep things clear, click on Name & Label and replace the name with SDmeans. Click on Continue.
Put SS into the Aggregate Variable(s) box. Click on Function and select Sum. Click on Continue. To keep things clear, click on Name & Label and replace the name with SSwithin. Click on Continue.
Click on Replace working data, OK.
Now we have the sum of squared deviations of each observation from that observations group mean. To find the estimated variance, we need to divide that by its degrees of freedom. To find the degrees of freedom, first figure out how many observations there are in each sample. (The number of observations is the number of groups times the number of observations in each group.) Then subtract the number of groups.
Use Transform, Compute to compute VARwith by dividing by the degrees of freedom.
Use Explore to see how the variance estimates from variation within each group compare with the variance you used in setting up the Syntax program.
You have more calculations to do before you have the variance estimated from the variation between groups. If you square SDmeans and multiply by one less than the number of groups you are working with, you get the sum of the square of the difference between each group mean and the overall mean. When you multiply that by the number of observations in each group, you will have the sum of squares for estimating the variance from variation between the groups. Use Transform, Compute to calculate SSb with this numeric expression (replacing the bracket parts): (SDmeans**2)*([number of groups]-1)*[number in each group] .
The variance estimated from the variation between groups can then be calculated with the formula SSb/([number of groups 1) .
In the last two steps, you multiplied and divided by one less than the number of groups. You could omit those, recognizing that as long as all groups have the same number of observations, the estimated variance is the variance of the group means times the number in each group.
Explore these variance estimates.
How did the estimated variances compare with the real variance of the population? Do the claims in Section 7.3 seem sensible?
Try different sample sizes. Would you like to propose any warnings for these variance estimates?
Save Your Work
You will be using this data in the next project.
| ©2008 Key College Publishing. All rights reserved. |
|