General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with Excel®

Activity 7.3

Section 7.3 suggests two ways to estimate the population variance from data in groups. One way looks at how the group means vary. The other way looks at how the observations within each group vary from the group mean. In this project, you will collect these estimates from many random samples from Rep US Sample. You can then explore the estimates and see whether they seem to work well.

To do this, you will use a strategy you have used before. You will set Excel to calculate the two variances on the Stats worksheet of the workbook you used in the last project. Then you will use a macro to take samples and record the variance estimates.

Open the workbook that you used last in the project in Section 6.3. Empty the Stats worksheet by right-clicking the upper-left gray cell and selecting Clear Contents.

Fill in the Stats worksheet as shown below. The boldfaced entries have to be typed by hand. The rest you can get by dragging the formula. Until you fill in the observations in column B, Excel will mark the cells containing equations with “#DIV/0!” to let you know that the formula for the mean is dividing by 0. That’s okay. It will get straightened out when you put the observations in.

 

A

B C D E F G H

1

Group

Observation Group Mean Deviation Squared
Deviation
Sum of
Squared
Deviations
Degrees
of
Freedom
Variance
Within

2

A

  =AVERAGE(B$2:B$6) =B2-C2 =D2^2 =SUM(E:E) 9 =F2/G2

3

A

  =AVERAGE(B$2:B$6) =B3-C3 =D3^2      

4

A

  =AVERAGE(B$2:B$6) =B4-C4 =D4^2      

5

A

  =AVERAGE(B$2:B$6) =B5-C5 =D5^2      

6

A

  =AVERAGE(B$2:B$6) =B6-C6 =D6^2      

7

B

  =AVERAGE(B$7:B$12) =B7-C7 =D7^2      

8

B

  =AVERAGE(B$7:B$12) =B8-C8 =D8^2      

9

B

  =AVERAGE(B$7:B$12) =B9-C9 =D9^2      

10

B

  =AVERAGE(B$7:B$12) =B10-C10 =D10^2  

11

B

  =AVERAGE(B$7:B$12) =B11-C11 =D11^2      
12 B   =AVERAGE(B$7:B$12) =B12-C12 =D12^2      

The macro will fill in the observations. You can change the number of observations in a group by adding or subtracting rows and dragging the formulas in C, D, and E as needed. If you add groups, you will have to set the formula for the new groups in column C to get the mean of each new group.

Review the plan above. Make sure you understand what the sheet does.

Choose which column of data you want to work with. The macro below is set up to take data from the fifth column (Interest Income).

The macro will work only with data that does not have missing values. So it won’t work with education or year of entry.

Here is the macro that will collect the values in column I (the ninth column).

Sub Sampling()
  Randomize Timer
  For r = 2 To 55           ‘Set your number of samples by changing 55.
    For SampleRow = 2 To 12 ‘Set to your total sample size by changing 12.
      dataRow = 2 + Fix(50000 * Rnd)
      Worksheets("Data").Rows(dataRow).Copy
      Worksheets("Sample").Rows(SampleRow).PasteSpecial
    Next SampleRow
    Worksheets("Sample").Columns(5).Copy ‘Change “5” to copy your column.
    Worksheets("Stats").Columns(2).PasteSpecial
    Worksheets("Stats").Cells(r, 9).Value = Cells(2, 8).Value
  Next r
End Sub

Get the mean and a histogram of the estimated variances. In the Data worksheet, you can get the population variance with =var() .

So far you have estimated the variance from the variation within each group. How did you do?

To get the variance calculated from the variation between groups, insert a column by right-clicking on D and selecting Insert. Then edit the worksheet to fit the following scheme. Note the changes you have to make to column E.

 

 

A

B C D E F G H I

1

Group

  Group Mean Overall Mean Deviation Squared
Deviation
Sum of
Squared
Deviations
Degrees
of
Freedom
Variance
Within

2

A

=AVERAGE(B$2:B$6) =AVERAGE(B$2:B$12) =C2-D2 =E2^2 =SUM(F:F) 2 =G2/H2

3

A

=AVERAGE(B$2:B$6) =AVERAGE(B$2:B$12) =C3-D3 =E3^2      

4

A

=AVERAGE(B$2:B$6) =AVERAGE(B$2:B$12) =C4-D4 =E4^2      

5

A

=AVERAGE(B$2:B$6) =AVERAGE(B$2:B$12) =C5-D5 =E5^2      

6

A

=AVERAGE(B$2:B$6) =AVERAGE(B$2:B$12) =C6-D6 =E6^2      

7

B

=AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C7-D7 =E7^2      

8

B

=AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C8-D8 =E8^2      

9

B

=AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C9-D9 =E9^2      

10

B

=AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C10-D10 =E10^2      

11

B

=AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C11-D11 =E11^2      
12 B =AVERAGE(B$7:B$12) =AVERAGE(B$2:B$12) =C12-D12 =E12^2      

Because the variance estimate is now one column over, we need to make some changes to the macro. Here’s the macro with the two changes noted in boldface on the third-to-last line.

Sub Sampling()
  Randomize Timer
  For r = 2 To 55 ‘Set your number of samples by changing 55.
    For SampleRow = 2 To 12 ‘Set to your total sample size by changing 12.
      dataRow = 2 + Fix(50000 * Rnd)
      Worksheets("Data").Rows(dataRow).Copy
      Worksheets("Sample").Rows(SampleRow).PasteSpecial
    Next SampleRow
    Worksheets("Sample").Columns(5).Copy ‘Change “5” to copy your column.
    Worksheets("Stats").Columns(2).PasteSpecial
    Worksheets("Stats").Cells(r, 10).Value = Cells(2, 9).Value
  Next r
End Sub

How do the estimated variances compare with the real variance of the population? Do the claims in Section 7.3 seem sensible?

Try different sample sizes. Would you like to propose any warnings for these variance estimates?


©2008 Key College Publishing. All rights reserved.