Data Matters with Excel®
Activity 3.3
In this project, we are going to perform a Monte Carlo test of the difference between two proportions. The difference that we saw in the data was 4.7%. To do the Monte Carlo test, we will take pairs of 1,000-observation samples. Every sample will come from a probability of 40.25%. We will calculate the difference in each pair and save those differences. The p-value is the proportion of the differences we find that is at least as far from 0 as 4.7%.
Here is how to run the Monte Carlo simulations.
- There will be two random numbers columns and two Honors columns, which will represent who made the honor roll at each school.
- Two cells will calculate the honors proportion at the two schools, and one cell will calculate the difference between the proportions of the current samples.
- A macro will calculate the page (taking new samples) and fill a column of differences.
Step 1: There will be two random numbers columns and two Honors columns, which will represent who made the honor roll at each school.
To label the first column, type Random1 into the top cell of column A. Type =rand() into the second row of column A.
Label the second column Honor1 and type =A2<.4025 into the second row.
Label the third column Random2 and type =RAND() into the second row.
Label the fourth column Honor2 and type =C2<.4025 into the second row.
Click and drag to select the second row of the first four columns. Then click on the small square and drag down to fill to row 11. That gives you 10 observations in each simulated school. You will increase the sample to 1,000 later, after you fully understand what the worksheet is doing.
Step 2: Two cells will calculate the honors proportion at the two schools, and one cell will calculate the difference between the proportions of the current samples.
Label the fifth column Proportion1 and type =COUNTIF(B:B,TRUE)/10 into the second row.
Label the sixth column Proportion2 and type =COUNTIF(D:D,TRUE)/10 into the second row.
Check that each of these cells has the right proportion.
Label the seventh column Difference and type =E2 F2 into the second row. Check that this is giving you the difference between the two proportions.
If the proportions look correct, edit the cells with the proportions so that they are dividing by 1,000 (=COUNTIF(B:B,TRUE)/1000 and =COUNTIF(D:D,TRUE)/1000). Then select the second row of the first four columns again, click on the small black square, and drag down to row 1,001. Now you have proportions in two samples of 1,000 and the difference between the proportions.
Step 3: A macro will calculate the page (taking new samples) and fill a column of differences.
Select Tools, Macro, Visual Basic Editor, Insert, Module.
Type in this macro.
Sub Record ()
For sample = 2 TO 1001
Calculate
Cells(sample, 8).Value = Cells(2,7).Value
Next sample
End Sub
|
This macro does something 1,000 times. The For sample = 2 TO 1001 . . . Next sample makes it do the task 1,000 times. The first time, sample is 2. The second time, sample is 3, and so on to the 1,000th time, when sample is 1,001.
The first thing that is done inside the For-To/Next loop is Calculate. Calculate resets all the random numbers to new random numbers, and so takes two new samples. That leads to a new difference.
The next step is Cells(sample, 8).Value = Cells(2,7).Value. That copies the difference in the second row of the seventh column into the sample row of the eighth column. The first time, it copies into the second row, the second time, into the third row, and so on.
Run the macro and get a histogram of the differences. Select Tools, Data Analysis, Histogram, OK. Click on the top difference, then drag to select all of the differences. Click OK. Click on the chart button then Finish.
Now you can see how differences between pairs of proportions would be distributed if both proportions in each pair were drawn from a system with a constant probability of 40.25%. What can you say about the p-value? Is a difference of 4.7% or larger a reasonably likely event if the null hypothesis is true? What do you conclude about the null hypothesis that the probability of making the honor roll was 40.25% in both years?
Save Your Work
You can save yourself some work later on by saving this workbook and revising it for later projects.
| ©2008 Key College Publishing. All rights reserved. |
|