Data Matters with Excel®
Activity 4.3
Section 4.3 makes a set of claims about p-values of chi-square statistics. They are summarized in Table 4.3.3. from Data Matters.
Table 4.3.3 (from Data Matters)
A Few Critical Values for Pearsons Chi-Square Statistic
|
df = (Rows 1) x (Columns 1)
|
Critical Chi-Square (for alpha = 5%)
|
|
1
|
3.84
|
|
2
|
5.99
|
|
3
|
7.81
|
|
4
|
9.49
|
|
5
|
11.07
|
Your task in this project is to check those critical values. You will do that by collecting measures from random samples from a simulation in which the null hypothesis is true, saving the chi-square values. You can then sort the chi-square values and make sure that only 5% are above each critical value.
One Degree of Freedom
Add and rename worksheets until you have four: Sample, Expected, Difference2, and RelativeDifference2. You will set up the worksheets, then use a macro to collect samples and record the chi-square values.
In Sample, label the first column Rows and the second column Columns. In the second row of either one, enter this formula: =FLOOR(2*RAND(),1) .
Heres how that formula works: RAND() produces random numbers from 0 to just under 1. When those numbers are multiplied by 2, you get numbers ranging from 0 to 1.999999. FLOOR strips off the decimals of positive numbers, leaving 0s and 1s. Later, you can multiply by larger numbers to get more rows, more columns, or more of both. FLOOR(3*RAND(),1) produces 0s, 1s, and 2s.
With this equation, the 0s and 1s are equally likely. With FLOOR(1.5*RAND(),1), 0s are twice as likely as 1s. If you would like, you can play with the relative probabilities to see if it makes a difference.
Drag the equation into the other column. Select the equations in both columns and drag down to one more row than the number of observations you want to use. You can use whatever sample size you would like, but you have to be careful that none of the cells will have expected counts below 5.
Get a pivotTable of the observations in the two columns (Data, PivotTable and PivotChart Report, Next, and so on). Make sure the PivotTable appears on a new worksheet and name that worksheet Observed.
In Observed, press Ctrl-A to select the entire sheet, right-click on a cell, and select Copy. In Expected, right-click on A1, select PasteSpecial, Values, OK. You are pasting the table into Expected so that you can see which cells in Observed include the table. In the Expected cells that currently contain the row totals (probably E5 and E6), enter a formula to replace the current values with the row proportions by getting the totals from Observed and dividing by the number of observations. For example, if you are entering the formula in E5 and you used 55 observations, enter =Observed!E5/55 . If you are entering the formula for E6 and you used 78 observations, enter =Observed!E6/78 .
In the cells that show the column totals, enter a formula to copy the column totals from Observed. For example, if you are entering a formula into B8, use =Observed!B8 . If you are entering a formula into C9, use =Observed!C9 .
Now we can fill Expecteds table with the expected counts. To get the expected counts, use the same steps you used in the last project.
In B5, type = , click on the row proportion for that row (on row 5), type an asterisk, and click on the column total for that column (in column B). Click to the left of the letter in that cell identifier for the row proportion and type a dollar sign so that it becomes $F5 or $H5 or $M5 or whatever. Click to the left of the number in the column totals cell identifier and type another dollar sign, this time so it becomes B$8 or B$10 or B$11. Drag the equation to the right, then down to fill the cells of the table.
In Difference2, click on B5 and enter =(Observed!B5-Expected!B5)^2 . Drag that equation to the right and down to cover the cells of the table.
In RelativeDifference2, click on B5 and enter =Difference2!B5/Expected!B5 . Drag that equation to cover the cells of the table.
Because of the way Excel made the PivotTable in Observed, the table you just made in RelativeDifference2 does not appear in column A or in row 1. You will set Excel to calculate the chi-square value in cell B1 and save the chi-square values in column A.
Click on B1, then on the summation sign (∑) at the top of the screen. Click and drag to select the relative squared deviations and press Enter. The chi-square value appears in B1.
Label column A Chi-Square Values by entering that label in A1.
From RelativeDifference2, get into the macro editor (Tools, Macro, Visual Basic Editor) and insert a module. Enter this code into the module.
Sub record()
For sample = 2 to 401
Sheets("Sample").Calculate
Sheets("Observed").PivotTables("PivotTable1").RefreshTable
Cells(sample,1).value=Cells(1,2).value
Next sample
End Sub
|
The macro will work only if RelativeDifference2 is the last sheet you were looking at before you ran the macro.
The macro takes a new sample on the Sample worksheet and updates the PivotTable on Observed. The values in Difference2 and RelativeDifference2 change automatically. Then the macro copies the new chi-square value in B2 to column A.
That macro is set to collect 400 samples. Five percent of 400 is 20, so the cutoff would appear between the 19th and 20th largest chi-square values. You could use another number of samples.
To check the cutoffs in Table 4.3.3, sort the chi-square values by selecting their column and clicking on the Sort-Descending button at the top of the screen. The Sort-Descending button has a Z over an A on it. Then you can find the chi-square values that were higher than 95% of the chi-square values.
How did the table do? Is 3.84 a good cutoff?
For Degrees of Freedom over One
To work with cross-tabs that have more rows and/or more columns, go back to the formulas for Row and/or Column. FLOOR(3*RAND()) will produce three categories, FLOOR(4*RAND()) will produce four categories, and so on.
To replace Observed, delete it and get a new pivot table. Rename the PivotTables sheet Observed.
You will have to edit or completely replace and rebuild Expected. Difference2 and RelativeDifference2 can be updated by selecting cells in their tables, then dragging their formulas over the new cells of the table.
Check all five rows of Table 4.3.3. Find the critical cutoff for six degrees of freedom.
Try different cross-tab designs that have the same degrees of freedom, like a three-column-by-three-row design and a five-column-by-two-row design. Do the cutoffs stay in the same place? Do the chi-square values really have the same distribution when the null hypothesis is true?
Check whether sample size matters.
Check whether it is really true that the expected counts must be at least 5. You can do this by using a two-column-by-two-row design with 19 observations in each sample.
In the previous section, you found the chi-square values for tests of correlations in the U.S. population. You were working with the representative U.S. sample that is available from the Data Matters web site at www.keycollege.com/dm. Get the p-values and report those tests in two ways: once in the style of the popular press and once in the complete style of the sciences.
| ©2008 Key College Publishing. All rights reserved. |
|