General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with Fathom! Dynamic Statistics™ software

Activity 6.2

One feature of the standard deviation is most puzzling. In calculating the standard deviation, we get the squared deviations and we summarize them. Rather than simply averaging them, we sum them and divide by one less than the sample size.

Section 6.2 claims that we divide by one less than the sample size rather than by the sample size in order to get a better estimate of the population’s variance (the square of the population’s standard deviation). The reason Section 6.2 claims this is that the sample’s mean is closer to the sample’s proportions than the population’s mean, so the deviations tend to be smaller than what you would see in the total population.

In this project, we are going to check that claim, by using the representative U.S. sample as our population again. Here are the steps.

  1. Pick a numeric attribute and get the population variance of that attribute. (Remember that these 50,000 are our population for the moment.)
  2. Pick a sample size, take random samples, and calculate the average squared deviation and the variance.
  3. Explore the two statistics to see which does a better job.

Here’s how to do these steps.

Step 1: Pick a numeric attribute and get the population variance of that attribute.

Open Rep US Sample. Select Analyze, Estimate Parameters, Empty Estimate, Estimate Mean. Drag the numeric attribute you want to work with onto Attribute (continuous): <unassigned>. The display shows the standard deviation. You can square that to get the variance.

Step 2: Pick a sample size, take random samples, and calculate the average squared deviation and the variance.

Select the Collection, then Analyze, Sample Cases. Select the Sample Collection and press “Ctrl-I” to get the Collection Inspector. Set your sample size where you want it (but above 1—there is no variance or standard deviation for samples of size 1). You might consider that the difference between the mean squared deviation and the standard deviation is smaller with larger sample sizes. Set Animation as you want it. Click on the Measures tab and enter a measure, Var, with the formula sampleVariance([your attribute]), replacing [your attribute] with the name of the attribute that you have the population for.

Enter another new variable. Call this one MSD (for “mean squared deviation”). Here is an easy way to get the mean squared deviation: Multiply the variance by one less than the sample size. That gives you the sum of squares. Then divide by the sample size. That gives you the mean squared deviation. The formula is sampleVariance([your attribute])([your sample size] – 1)/[your sample size].

Where noted in brackets, put in your sample size and your attribute name.

To get multiple samples, select Analyze, Collect Measures. Drag a case table onto the workspace to see the measures. Inspect the Measures Collection to set the number of measures and Animation.

Step 3: Explore the two statistics to see which does a better job.

Use means, medians, and histograms to explore the variances and the mean squared deviations. Which seems to do a better job estimating the population’s variance?

Something to Think About

Look at the sampling distribution of variances. It isn’t symmetrical. That means that although the mean variance is a good estimate, most of the time the variance is too low. Often it is way too low. That causes special problems, as we will see later.

There is a function, populationVariance(), that you could have used in creating the measures. If you add it to the sample’s measures, you will see how it compares with the variance and the mean squared deviation.


©2008 Key College Publishing. All rights reserved.