Data Matters with SPSS®
Activity 6.2
One feature of the standard deviation is most puzzling. In calculating the standard deviation, we get the squared deviations and we summarize them. Rather than simply averaging them, we sum them and divide by one less than the sample size.
Section 6.2 claims that we divide by one less than the sample size rather than by the sample size in order to get a better estimate of the populations variance (the square of the populations standard deviation). Section 6.2 claims this because the samples mean is closer to the samples proportions than to the populations mean, so the deviations tend to be smaller than what you would see in the total population.
In this project, you are going to check that claim, by using the representative U.S. sample as our population again. Here are the steps.
- Pick a numeric variable and get the population variance of that variable. (Remember that these 50,000 are our population for the moment.)
- Pick a sample size, take random samples, and calculate the average squared deviation and the variance.
- Explore the two statistics to see which does a better job.
Heres how to do these steps.
Step 1: Pick a numeric variable and get the population variance of that variable.
Open RepUSSample.sav. Select Analyze, Descriptive Statistics, Descriptives. Double-click the variable that you want to work with, click on Options, click on the checkbox next to Variance, then click on Continue, OK.
Step 2: Pick a sample size, take random samples, and calculate the average squared deviation and the variance.
To do this, you will use the same strategy you used for the project in Section 5.1. You will sort randomly, identify groups of observations as sample, and aggregate.
Sorting Randomly
Select Transform, Compute. Enter random for the new target variable and RV.UNIFORM(0,100) for the numeric expression, then click OK.
Click on Data, Sort Cases. Double-click on random and click OK.
Adding the Sample Variable
Before you proceed, you need to choose your sample size. Your sample size will have to be greater than 1. There is no variance or standard deviation for samples of size 1. You might also consider that the difference between the mean squared deviation and the standard deviation is smaller with larger sample sizes.
Select Transform, Compute. Name the new target variable sample. Click on If, Include if case satisfies condition: and enter the equation $CASENUM <= [number of samples]*[sample size] . (Replace the brackets with your number of samples and sample size.)
Click on Continue and enter the numeric expression TRUNC(($CASENUM-1)/[sample size]) (replacing the bracket part with your sample size).
To get the variances, click on Data, Aggregate. Scroll down and click on sample, then click on the triangle next to the Break Variable(s) box. Select the variable you are interested in and click on the triangle next to the Aggregate Variable(s) box.
Select Function, Standard Deviation, Continue. Click on Replace working data file, OK.
SPSS returns a summary for the variables you didnt assign a sample for. They appear at the top and have a . (a period) for sample. You can select and delete that row.
You now have a column of standard deviations. Use Transform, Compute to add a variance. For calculating the variance, use the numeric expression [The standard deviation variable]**2 . Replace the bracket part with the name of your standard deviation variable. **2 means raised to the power of 2, which is the same as squared.
Now you need to get the mean squared deviation. Here is an easy way: Multiply the variance by one less than the sample size. That gives you the sum of squares. Then divide by the sample size. That gives you the mean squared deviation. The formula is Variance*([your sample size]-1)/[your sample size] . Put in your own sample size where noted in brackets.
Step 3: Explore the two statistics to see which does a better job.
Use Explore (in Analyze, Descriptives) to explore the variances and the mean squared deviations. Which seems to be better at estimating the populations variance?
Something to Think About
Look at the sampling distribution of variances. It isnt symmetrical. That means that although the mean variance is a good estimate, most of the time the variance is too low. Often it is way too low. That causes special problems, as we will see later.
| ©2008 Key College Publishing. All rights reserved. |
|