General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with Excel®

Activity 4.3

Section 4.3 makes a set of claims about p-values of chi-square statistics. They are summarized in Table 4.3.3. from Data Matters.

Table 4.3.3 (from Data Matters)
A Few Critical Values for Pearson’s Chi-Square Statistic

df = (Rows – 1) x (Columns – 1)

Critical Chi-Square (for alpha = 5%)

1

3.84

2

5.99

3

7.81

4

9.49

5

11.07

Your task in this project is to check those critical values. You will do that by collecting measures from random samples from a simulation in which the null hypothesis is true, saving the chi-square values. You can then sort the chi-square values and make sure that only 5% are above each critical value.

One Degree of Freedom

Add and rename worksheets until you have four: Sample, Expected, Difference2, and RelativeDifference2. You will set up the worksheets, then use a macro to collect samples and record the chi-square values.

In Sample, label the first column Rows and the second column Columns. In the second row of either one, enter this formula: =FLOOR(2*RAND(),1) .

Here’s how that formula works: RAND() produces random numbers from 0 to just under 1. When those numbers are multiplied by 2, you get numbers ranging from 0 to 1.999999. FLOOR strips off the decimals of positive numbers, leaving 0’s and 1’s. Later, you can multiply by larger numbers to get more rows, more columns, or more of both. FLOOR(3*RAND(),1) produces 0’s, 1’s, and 2’s.

With this equation, the 0’s and 1’s are equally likely. With FLOOR(1.5*RAND(),1), 0’s are twice as likely as 1’s. If you would like, you can play with the relative probabilities to see if it makes a difference.

Drag the equation into the other column. Select the equations in both columns and drag down to one more row than the number of observations you want to use. You can use whatever sample size you would like, but you have to be careful that none of the cells will have expected counts below 5.

Get a pivotTable of the observations in the two columns (Data, PivotTable and PivotChart Report, Next, and so on). Make sure the PivotTable appears on a new worksheet and name that worksheet Observed.

In Observed, press “Ctrl-A” to select the entire sheet, right-click on a cell, and select Copy. In Expected, right-click on A1, select PasteSpecial, Values, OK. You are pasting the table into Expected so that you can see which cells in Observed include the table. In the Expected cells that currently contain the row totals (probably E5 and E6), enter a formula to replace the current values with the row proportions by getting the totals from Observed and dividing by the number of observations. For example, if you are entering the formula in E5 and you used 55 observations, enter =Observed!E5/55 . If you are entering the formula for E6 and you used 78 observations, enter =Observed!E6/78 .

In the cells that show the column totals, enter a formula to copy the column totals from Observed. For example, if you are entering a formula into B8, use =Observed!B8 . If you are entering a formula into C9, use =Observed!C9 .

Now we can fill Expected’s table with the expected counts. To get the expected counts, use the same steps you used in the last project.

In B5, type = , click on the row proportion for that row (on row 5), type an asterisk, and click on the column total for that column (in column B). Click to the left of the letter in that cell identifier for the row proportion and type a dollar sign so that it becomes “$F5” or “$H5” or “$M5” or whatever. Click to the left of the number in the column total’s cell identifier and type another dollar sign, this time so it becomes “B$8” or “B$10” or “B$11.” Drag the equation to the right, then down to fill the cells of the table.

In Difference2, click on B5 and enter =(Observed!B5-Expected!B5)^2 . Drag that equation to the right and down to cover the cells of the table.

In RelativeDifference2, click on B5 and enter =Difference2!B5/Expected!B5 . Drag that equation to cover the cells of the table.

Because of the way Excel made the PivotTable in Observed, the table you just made in RelativeDifference2 does not appear in column A or in row 1. You will set Excel to calculate the chi-square value in cell B1 and save the chi-square values in column A.

Click on B1, then on the summation sign (∑) at the top of the screen. Click and drag to select the relative squared deviations and press “Enter.” The chi-square value appears in B1.

Label column A Chi-Square Values by entering that label in A1.

From RelativeDifference2, get into the macro editor (Tools, Macro, Visual Basic Editor) and insert a module. Enter this code into the module.

Sub record()
  For sample = 2 to 401
    Sheets("Sample").Calculate
    Sheets("Observed").PivotTables("PivotTable1").RefreshTable
    Cells(sample,1).value=Cells(1,2).value
  Next sample
End Sub

The macro will work only if RelativeDifference2 is the last sheet you were looking at before you ran the macro.

The macro takes a new sample on the Sample worksheet and updates the PivotTable on Observed. The values in Difference2 and RelativeDifference2 change automatically. Then the macro copies the new chi-square value in B2 to column A.

That macro is set to collect 400 samples. Five percent of 400 is 20, so the cutoff would appear between the 19th and 20th largest chi-square values. You could use another number of samples.

To check the cutoffs in Table 4.3.3, sort the chi-square values by selecting their column and clicking on the Sort-Descending button at the top of the screen. The Sort-Descending button has a “Z” over an “A” on it. Then you can find the chi-square values that were higher than 95% of the chi-square values.

How did the table do? Is 3.84 a good cutoff?

For Degrees of Freedom over One

To work with cross-tabs that have more rows and/or more columns, go back to the formulas for Row and/or Column. FLOOR(3*RAND()) will produce three categories, FLOOR(4*RAND()) will produce four categories, and so on.

To replace Observed, delete it and get a new pivot table. Rename the PivotTable’s sheet Observed.

You will have to edit or completely replace and rebuild Expected. Difference2 and RelativeDifference2 can be updated by selecting cells in their tables, then dragging their formulas over the new cells of the table.

Check all five rows of Table 4.3.3. Find the critical cutoff for six degrees of freedom.

Try different cross-tab designs that have the same degrees of freedom, like a three-column-by-three-row design and a five-column-by-two-row design. Do the cutoffs stay in the same place? Do the chi-square values really have the same distribution when the null hypothesis is true?

Check whether sample size matters.

Check whether it is really true that the expected counts must be at least 5. You can do this by using a two-column-by-two-row design with 19 observations in each sample.

In the previous section, you found the chi-square values for tests of correlations in the U.S. population. You were working with the representative U.S. sample that is available from the Data Matters web site at www.keycollege.com/dm. Get the p-values and report those tests in two ways: once in the style of the popular press and once in the complete style of the sciences.


©2008 Key College Publishing. All rights reserved.