Data Matters with Excel®
Activity 6.2
One feature of the standard deviation is most puzzling. In calculating the standard deviation, we get the squared deviations and we summarize them. Rather than simply averaging them, we sum them and divide by one less than the sample size.
Section 6.2 claims that we divide by one less than the sample size rather than by the sample size in order to get a better estimate of the populations variance (the square of the populations standard deviation). The reason Section 6.2 claims this is that the samples mean is closer to the samples proportions than the populations mean, so the deviations tend to be smaller than what you would see in the total population.
In this project, we are going to check that claim, by using the representative U.S. sample as our population again. Here are the steps.
- Pick a numeric variable and get the population variance of that variable. (Remember that these 50,000 are our population for the moment.)
- Pick a sample size, take random samples, and calculate the average squared deviation and the variance.
- Explore the two statistics to see which does a better job.
Heres how to do these steps.
Step 1: Pick a numeric variable and get the population variance of that variable.
Open the workbook you used for the project in Section 5.2. (You could also do this project with RepUSSample.xls, but that will require your copying the macro from Section 5.2.) Select an empty cell to the right of the data, type =var( , select the column of the data you are interested in, type ) , and press Enter. That puts the variance of your column in that cell.
Step 2: Pick a sample size, take random samples, and calculate the average squared deviation and the variance.
Using the same strategy you used in the Section 5.2 project, you will take samples into your Sample worksheet and calculate and store the statistics youre interested in on a third worksheet.
Add a worksheet, Stats. Label column A Variance, column B Mean Squared Error, column C Variances, and column D Mean Squared Errors.
Look at the data and note the column of the data you are interested in. In cell A2 in your Stats worksheet, enter the equation =VAR(Sample!J:J) replacing the J:J with C:C or F:F or whatever is the column of the data youre interested in.
Before you proceed, you need to choose your sample size. Your sample size will have to be greater than 1. There is no variance or standard deviation for samples of size 1. You might also consider that the difference between the mean squared deviation and the standard deviation is smaller with larger sample sizes.
Now put the mean squared deviation in B2. An easy way to get the mean squared deviation is to multiply the variance by one less than the sample size (that gives you the sum of squares) then divide by the sample size (that gives you the mean squared deviation). The formula is =A2*([your sample size]-1)/[your sample size] . (Put in your own sample size where noted in brackets).
Here is the macro that will create the samples and record the variance and mean squared deviation of each sample.
Sub Sampling()
Randomize Timer
For r = 2 To 55 Set 55 to the number of samples you want.
For SampleRow = 2 To 4 Set 4 to one over your sample size.
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample").Rows(SampleRow).PasteSpecial
Next SampleRow
Worksheets("Stats").Cells(r, 3).Value = Cells(2, 1).Value
Worksheets("Stats").Cells(r, 4).Value = Cells(2, 2).Value
Next r
End Sub
|
This is roughly the same macro you have seen before. The difference (in bold) is that two cells get copied to record the statistics.
Step 3: Explore the two statistics to see which does a better job.
Use means, medians, and histograms to explore the variances and the mean squared deviations. Which seems to do a better job estimating the populations variance?
Something to Think About
Look at the sampling distribution of variances. It isnt symmetrical. That means that although the mean variance is a good estimate, most of the time the variance is too low. Often it is way too low. That causes special problems, as we will see later.
Save Your Work
In the next project, you will use the workbook you worked with in this project.
| ©2008 Key College Publishing. All rights reserved. |
|