General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with Excel®

Activity 3.1

Section 3.1 claims that if we take random samples from a population, 95% of the time the proportions in those samples will fall within two standard errors of their matching population proportions. That is a 95% prediction interval. A 67% prediction interval is from one standard error below the population’s proportion to one standard error above it.

To test this claim, we are going to use the RepUSSample.xls file that we used in Section 2.1.

The project in Section 3.1 requires these steps.

  1. Find out some of the population’s proportions.
  2. Pick a sample size and create prediction intervals for random samples’ proportions.
  3. Use your software to take random samples.
  4. Check what proportion fell within your prediction interval and what proportion fell outside.

Here’s how to do each step.

Step 1: Find out some of the population’s proportions.

Open RepUSSample.xls. Choose data whose proportions interest you (Data, PivotTable and PivotChart Report. . ., Next, and so on). When the PivotTable wizard asks you what data to use, you can click on the L in gray over the Year_of_Immigration column and drag left to select all the columns of data, then click on Next. Click on New Worksheet before setting up the layout, so you don’t add to the rows in your Data worksheet.

In Layout, drag the variable you are interested in, into Rows and Data. Double-click on the one you dragged into Data, click on Options, then on the black triangle that appears. Select % of Column. Click OK, OK, Finish.

Step 2: Pick a sample size and create prediction intervals for random samples’ proportions.

Check Section 3.1 in Data Matters if you’re not sure how to do this.

Step 3: Use your software to take random samples.

To do this, you will create two new worksheets, Sample and Proportions. Then, just like you did in Section 2.2, you will collect a sample and record the proportion. The difference is that to get the data, you will first take a random sample of the data in the same way you took samples in Section 2.1. You will put that sample into your Sample worksheet using a macro you set up in Section 2.1. Then you will copy the data for the variable you’re interested in, paste it into your Proportions worksheet, and calculate and save the proportion. You will create a macro that can do this over and over again.

To get the two new worksheets, select Insert, Worksheet. Double-click on the tab at the bottom of the screen that has the new worksheet’s name. (The name is something like “sheet1.”) Type in Sample and press “Enter.” Repeat these steps to create a Proportions worksheet.

If you would like, copy the first row from your Data worksheet and paste it into the first row of your Sample worksheet. It’s not essential, but it may help you keep track of what you’re doing.

Open your Proportions worksheet. To keep the macro similar to the one we used in Section 2.2, we’re going to skip the first column. Put the following labels into the first row of your Proportions worksheet:

Column B: Data
Column C: Count
Column D: Sample_Size
Column E: Sample_Proportion
Column F: Proportions

Getting the Data into the Sample Worksheet

Here is the macro you set up in Section 2.1. No changes are needed. To enter it, select Tools, Macro, Visual Basic Editor, Insert, Module.

Sub takeSample()
  Randomize Timer
  For SampleRow = 2 To 11
    dataRow = 2 + Fix(50000 * Rnd)
    Worksheets("Data").Rows(dataRow).Copy 
    Worksheets("Sample").Rows(SampleRow).PasteSpecial
  Next SampleRow
End Sub

As before, to run the macro, click on the go button, marked with a triangle.

This will create a sample of 10 people. If you want to work with a larger or smaller sample, change the “11” to one more than your sample size.

Calculating the Proportion

In your Sample worksheet, copy the column of the data you are interested in by right-clicking on the gray box at the top of that column and selecting Copy. Open your Proportions worksheet, click on the B in gray at the top, and select Paste.

In the top data cell (just under the column name) of column C, type =COUNTIF(B:B , whatever measurement you want to calculate the proportion for, and ) . If you want to count something that is text, you have to put the text in quotation marks. For example, if you wanted to calculate the proportion of females in the data in column B, you would type =COUNTIF(B:B,”Female”) . If you wanted to count the proportion of people whose education had stopped after graduating from high school, you would type =COUNTIF(B:B,12) . Education is a numeric measurement, so you don’t put quotation marks around the 12.

In the top data cell of column D, type the number of observations in your sample. Remember that your sample is one less than the number of rows your data appear on.

In the top data cell of column E, enter the equation =C2/D2 . That gives you the proportion for your first sample.

The Macro

Now you need a macro so that you can take many samples and calculate their proportions.

Here is the macro from Section 2.2.

Sub Sampling()
  For recordingRow = 2 To 5
    Calculate
    Cells(recordingRow, 6).Value = Cells(2, 5).Value
  Next recordingRow
End Sub

With this macro, we got a new sample every time Excel ran Calculate. That was because we had random numbers in our worksheet. Now we need Excel to take a random sample from Data and put it into Sample, then copy the column we are interested in from Sample into column B of Proportions. To do that, we start by replacing Calculate with our earlier sampling macro. Here’s what we have (with the changes in boldface).

Sub Sampling()
  For recordingRow = 2 To 5
    Randomize Timer
    For SampleRow = 2 To 11
      dataRow = 2 + Fix(50000 * Rnd)
      Worksheets("Data").Rows(dataRow).Copy
      Worksheets("Sample"). Rows(SampleRow).PasteSpecial
    Next SampleRow

    Cells(recordingRow, 6).Value = Cells(2, 5).Value
  Next recordingRow
End Sub

We need to set the random number generator only once, so we’ll move Randomize Timer outside the For-To/Next Loop.

Sub Sampling()
  Randomize Timer
  For recordingRow = 2 To 5
    For SampleRow = 2 To 11
      dataRow = 2 + Fix(50000 * Rnd)
      Worksheets("Data").Rows(dataRow).Copy
      Worksheets("Sample").Rows(SampleRow).PasteSpecial
    Next SampleRow

    Cells(recordingRow, 6).Value = Cells(2, 5).Value
  Next recordingRow
End Sub

All that’s left is a code for copying the column in the Sample worksheet that contains the variable you’re interested in and pasting it into column B of your Proportions worksheet. Here’s the macro code that would do that if you were interested in Gender. Gender is in the fourth column, column D.

Worksheets("Sample").Columns(4).Copy Worksheets("Proportions").Columns(2).PasteSpecial

Compare that macro code with the previous code. It is very much like what you did when you were copying rows from Data to Sample. But with this new code, you copy columns rather than rows. For whatever variable you’re interested in, determine which column (by number) contains that data, then change the number in Columns(). For example, if you wanted to get the data for Marital Status, which is in the seventh column, the code would look like this.

Worksheets("Sample").Columns(7).Copy
Worksheets("Proportions").Columns(2).PasteSpecial

Now put that remaining code into your program and you’re done.

Sub Sampling()
  Randomize Timer
  For recordingRow = 2 To 5
    For SampleRow = 2 To 11
      dataRow = 2 + Fix(50000 * Rnd)
      Worksheets("Data").Rows(dataRow).Copy
      Worksheets("Sample").Rows(SampleRow).PasteSpecial
    Next SampleRow
    Worksheets("Sample").Columns(4).Copy
    Worksheets("Proportions").Columns(2).PasteSpecial
    Cells(recordingRow, 6).Value = Cells(2, 5).Value
  Next recordingRow
End Sub

This macro assumes that the last worksheet you looked at was the Proportions worksheet. It won’t run correctly unless that is the case. So before you run it, open your Proportions worksheet, then go directly to the macro editor.

Run the program to make sure it is working, then set the number of samples to as many as you would like by editing For SampleRow = 2 To 11. I would recommend 100 or 1,000 to make it easy to check your prediction interval. (If you have a slow computer, it may take too long to do this. If this is the case, you can stop it by holding down “Ctrl” and pressing the “Break” key.)

Step 4: Check what proportion fell within your prediction interval and what proportion fell outside.

To sort the samples’ proportions, select their column by clicking on the F in gray, then the Sort Ascending button at the top of the screen. Then scroll through to check how well your prediction interval worked.

Does your prediction interval work correctly? That is, for the 95% prediction interval, do 95% fall inside the interval and 5% fall outside? Do the samples’ proportions that fall outside fall evenly, with 2.5% on each side? For the 67% prediction interval, do two-thirds fall inside? Does one-sixth fall below and one-sixth above the interval?

Try other sample sizes. Try other variables and measurements. Does the prediction interval work equally well for all population proportions? What if your prediction interval goes below 0 or above 100%? Is it sensible to adjust to those limits? Should you change the other side of the interval?

Save Your Work

In later projects, we will build on the workbook you developed for this project, so be sure to save a copy of this workbook.


©2008 Key College Publishing. All rights reserved.