Data Matters with Excel®
Activity 3.1
Section 3.1 claims that if we take random samples from a population, 95% of the time the proportions in those samples will fall within two standard errors of their matching population proportions. That is a 95% prediction interval. A 67% prediction interval is from one standard error below the populations proportion to one standard error above it.
To test this claim, we are going to use the RepUSSample.xls file that we used in Section 2.1.
The project in Section 3.1 requires these steps.
- Find out some of the populations proportions.
- Pick a sample size and create prediction intervals for random samples proportions.
- Use your software to take random samples.
- Check what proportion fell within your prediction interval and what proportion fell outside.
Heres how to do each step.
Step 1: Find out some of the populations proportions.
Open RepUSSample.xls. Choose data whose proportions interest you (Data, PivotTable and PivotChart Report. . ., Next, and so on). When the PivotTable wizard asks you what data to use, you can click on the L in gray over the Year_of_Immigration column and drag left to select all the columns of data, then click on Next. Click on New Worksheet before setting up the layout, so you dont add to the rows in your Data worksheet.
In Layout, drag the variable you are interested in, into Rows and Data. Double-click on the one you dragged into Data, click on Options, then on the black triangle that appears. Select % of Column. Click OK, OK, Finish.
Step 2: Pick a sample size and create prediction intervals for random samples proportions.
Check Section 3.1 in Data Matters if youre not sure how to do this.
Step 3: Use your software to take random samples.
To do this, you will create two new worksheets, Sample and Proportions. Then, just like you did in Section 2.2, you will collect a sample and record the proportion. The difference is that to get the data, you will first take a random sample of the data in the same way you took samples in Section 2.1. You will put that sample into your Sample worksheet using a macro you set up in Section 2.1. Then you will copy the data for the variable youre interested in, paste it into your Proportions worksheet, and calculate and save the proportion. You will create a macro that can do this over and over again.
To get the two new worksheets, select Insert, Worksheet. Double-click on the tab at the bottom of the screen that has the new worksheets name. (The name is something like sheet1.) Type in Sample and press Enter. Repeat these steps to create a Proportions worksheet.
If you would like, copy the first row from your Data worksheet and paste it into the first row of your Sample worksheet. Its not essential, but it may help you keep track of what youre doing.
Open your Proportions worksheet. To keep the macro similar to the one we used in Section 2.2, were going to skip the first column. Put the following labels into the first row of your Proportions worksheet:
Column B: Data
Column C: Count
Column D: Sample_Size
Column E: Sample_Proportion
Column F: Proportions
Getting the Data into the Sample Worksheet
Here is the macro you set up in Section 2.1. No changes are needed. To enter it, select Tools, Macro, Visual Basic Editor, Insert, Module.
Sub takeSample()
Randomize Timer
For SampleRow = 2 To 11
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample").Rows(SampleRow).PasteSpecial
Next SampleRow
End Sub
|
As before, to run the macro, click on the go button, marked with a triangle.
This will create a sample of 10 people. If you want to work with a larger or smaller sample, change the 11 to one more than your sample size.
Calculating the Proportion
In your Sample worksheet, copy the column of the data you are interested in by right-clicking on the gray box at the top of that column and selecting Copy. Open your Proportions worksheet, click on the B in gray at the top, and select Paste.
In the top data cell (just under the column name) of column C, type =COUNTIF(B:B , whatever measurement you want to calculate the proportion for, and ) . If you want to count something that is text, you have to put the text in quotation marks. For example, if you wanted to calculate the proportion of females in the data in column B, you would type =COUNTIF(B:B,Female) . If you wanted to count the proportion of people whose education had stopped after graduating from high school, you would type =COUNTIF(B:B,12) . Education is a numeric measurement, so you dont put quotation marks around the 12.
In the top data cell of column D, type the number of observations in your sample. Remember that your sample is one less than the number of rows your data appear on.
In the top data cell of column E, enter the equation =C2/D2 . That gives you the proportion for your first sample.
The Macro
Now you need a macro so that you can take many samples and calculate their proportions.
Here is the macro from Section 2.2.
Sub Sampling()
For recordingRow = 2 To 5
Calculate
Cells(recordingRow, 6).Value = Cells(2, 5).Value
Next recordingRow
End Sub
|
With this macro, we got a new sample every time Excel ran Calculate. That was because we had random numbers in our worksheet. Now we need Excel to take a random sample from Data and put it into Sample, then copy the column we are interested in from Sample into column B of Proportions. To do that, we start by replacing Calculate with our earlier sampling macro. Heres what we have (with the changes in boldface).
Sub Sampling()
For recordingRow = 2 To 5
Randomize Timer
For SampleRow = 2 To 11
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample"). Rows(SampleRow).PasteSpecial
Next SampleRow
Cells(recordingRow, 6).Value = Cells(2, 5).Value
Next recordingRow
End Sub
|
We need to set the random number generator only once, so well move Randomize Timer outside the For-To/Next Loop.
Sub Sampling()
Randomize Timer
For recordingRow = 2 To 5
For SampleRow = 2 To 11
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample").Rows(SampleRow).PasteSpecial
Next SampleRow
Cells(recordingRow, 6).Value = Cells(2, 5).Value
Next recordingRow
End Sub
|
All thats left is a code for copying the column in the Sample worksheet that contains the variable youre interested in and pasting it into column B of your Proportions worksheet. Heres the macro code that would do that if you were interested in Gender. Gender is in the fourth column, column D.
Worksheets("Sample").Columns(4).Copy Worksheets("Proportions").Columns(2).PasteSpecial
|
Compare that macro code with the previous code. It is very much like what you did when you were copying rows from Data to Sample. But with this new code, you copy columns rather than rows. For whatever variable youre interested in, determine which column (by number) contains that data, then change the number in Columns(). For example, if you wanted to get the data for Marital Status, which is in the seventh column, the code would look like this.
Worksheets("Sample").Columns(7).Copy
Worksheets("Proportions").Columns(2).PasteSpecial
|
Now put that remaining code into your program and youre done.
Sub Sampling()
Randomize Timer
For recordingRow = 2 To 5
For SampleRow = 2 To 11
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample").Rows(SampleRow).PasteSpecial
Next SampleRow
Worksheets("Sample").Columns(4).Copy
Worksheets("Proportions").Columns(2).PasteSpecial
Cells(recordingRow, 6).Value = Cells(2, 5).Value
Next recordingRow
End Sub
|
This macro assumes that the last worksheet you looked at was the Proportions worksheet. It wont run correctly unless that is the case. So before you run it, open your Proportions worksheet, then go directly to the macro editor.
Run the program to make sure it is working, then set the number of samples to as many as you would like by editing For SampleRow = 2 To 11. I would recommend 100 or 1,000 to make it easy to check your prediction interval. (If you have a slow computer, it may take too long to do this. If this is the case, you can stop it by holding down Ctrl and pressing the Break key.)
Step 4: Check what proportion fell within your prediction interval and what proportion fell outside.
To sort the samples proportions, select their column by clicking on the F in gray, then the Sort Ascending button at the top of the screen. Then scroll through to check how well your prediction interval worked.
Does your prediction interval work correctly? That is, for the 95% prediction interval, do 95% fall inside the interval and 5% fall outside? Do the samples proportions that fall outside fall evenly, with 2.5% on each side? For the 67% prediction interval, do two-thirds fall inside? Does one-sixth fall below and one-sixth above the interval?
Try other sample sizes. Try other variables and measurements. Does the prediction interval work equally well for all population proportions? What if your prediction interval goes below 0 or above 100%? Is it sensible to adjust to those limits? Should you change the other side of the interval?
Save Your Work
In later projects, we will build on the workbook you developed for this project, so be sure to save a copy of this workbook.
| ©2008 Key College Publishing. All rights reserved. |
|