General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with SPSS®

Activity 8.1

Your task in this project is to find how F is distributed so you can fill in the following table. Three cells of the table are already filled in. It would be a good idea to do at least two of those first to make sure SPSS is set up correctly.

Table 8.1.3 (from Data Matters)
Critical F-Values for the Analysis of Variance
(If an ANOVA has an F-value greater than
the value in the table, the F is significant.)

 

df of Variance Estimated from Between-Group
Variation (df numerator: Number of Groups – 1)

df of Variance
Estimated from
Within-Group Variation
(df denominator:
Number of
Observations—Number
of Groups)

 

1 2 3

1

     
2 18.56    

3

     

4

  6.6  

60

     

100

    2.76

Two Groups, Four Observations

In the project in Section 7.3, you directed SPSS to create pairs of variance estimates. To do this project, you need to go only one step further, and calculate F. The following is the Syntax program from Section 7.3. You set the number of observations in each group on the third line.

INPUT PROGRAM.
  LOOP #Sample = 1 TO 2000.
    LOOP #Case = 1 to 2.
      COMPUTE sample = #Sample.
      COMPUTE measure = RV.NORMAL(0,7).
      END CASE.
    END LOOP.
  END LOOP.
  END FILE.
END INPUT PROGRAM.
EXECUTE.

To get the set of variances, you can follow the aggregation steps for the project in Section 7.3.

An alternative is to run this Syntax program on your file from Section 7.3.

AGGREGATE
  /OUTFILE=*
  /BREAK=sample
  /mean = MEAN(measure) /sd = SD(measure).

COMPUTE ss = (sd**2)*1.
COMPUTE sample = TRUNC(($CASENUM-1)/2).
EXECUTE.

AGGREGATE
  /OUTFILE=*
  /BREAK=sample
  /sdMeans = SD(mean) /ssWithin = SUM(ss).

COMPUTE varwithn = sswithin/2.
COMPUTE varBetwn = (sdMeans**2)*2.
EXECUTE.
COMPUTE F = varBetwn/varwithn.
EXECUTE.

Here is what the lines in this Syntax program tell SPSS to do.

AGGREGATE tells SPSS to aggregate.

/OUTFILE=* tells SPSS to replace the working file with the aggregation.

/BREAK=sample indicates that sample is the break variable.

/mean = MEAN(measure) /sd = SD(measure)./mean=MEAN(measure) indicates that the aggregation should include a variable called mean that is the mean of the measures in each sample. /sd = SD(measure) indicates that there should be a variable named sd that is the standard deviation of the measures in each sample.

COMPUTE ss = (sd**2)*1 .—This command runs after the aggregation is complete. The standard deviation is squared (sd**2), which produces the variance. The variance is then multiplied by one less than the sample size, which produces the sum of squared deviations of each observation from its group mean. (When you revise this Syntax to use larger group sizes, you will have to replace 1 with one less than the number in your groups.)

COMPUTE sample = TRUNC(($CASENUM-1)/2) .—This command sets sample so that it will combine the groups. The formula puts two groups together in each sample. To put three together, use TRUNC(($CASENUM-1)/3) . For four groups, divide by 4.

EXECUTE. runs the computations before the next aggregation.

AGGREGATE
/OUTFILE=*
/BREAK=sample
/sdMeans = SD(mean) /ssWithin = SUM(ss).
—This is the second aggregation. It creates two variables. sdMeans is the standard deviation of the group means. That is used later to calculate the variance estimated from variation between the groups. ssWithin is the sum of the squared deviations from within each group.

COMPUTE varwithn = sswithin/2 .—Once the aggregation is complete, the variance from within is calculated by dividing the sum of squares from within by the degrees of freedom. This program is set for two groups and four observations, so there are two degrees of freedom. When you revise the code to work with different numbers of observations and/or groups, you will have to change 2 to the degrees of freedom.

COMPUTE varBetwn = (sdMeans**2)*2 .—To find the estimated variance from the variation between, you square the standard deviation to get the variance of the means. Then you would multiply by one less than the number of groups to get the sum of squared deviations of each sample mean from the overall mean. To get the sum of squares from between the groups, you multiply by the number of observations in each group. To get the variance, you divide by one less than the number of groups.

Those steps involve multiplying by one less than the number of groups and, later, dividing by one less than the number of groups. The division cancels out the multiplication, so this COMPUTE doesn’t include those steps.

The last 2 is the number of observations in each group. This calculation wouldn’t work if the groups did not all have the same number of observations. When you change the program to test other designs, you will have to change that 2 to the number of observations in each group.

EXECUTE.

COMPUTE F = varBetwn/varwithn.

EXECUTE.

The final step is to calculate F.

After you have the F-values, sort them by clicking on Data, Sort Cases. Double-click on F, then click OK. Now you can scroll down 5% and find the cutoff.

Try a larger number of samples.

My experience is that with the design set up in this section, with fewer than 10,000 samples the F-value 5% down the list is more than 1 away from 18.56. When I was creating the entries for that table, I used more than 400,000 samples.

Other Designs

To get the cutoffs for the other sets of degrees of freedom, you can use this Syntax program with edits. This is a complete Syntax, including the first creation of the samples.

There are comments in this program. SPSS ignores anything written on a line that begins with COMMENT. The comments identify the lines that need changes as you change the design to find how F is distributed for other degrees of freedom.

INPUT PROGRAM.
COMMENT Edit the next line to change the number of samples.
  LOOP #Sample = 1 TO 4000.
COMMENT Edit the next line to set the number in each group.
    LOOP #Case = 1 to 2.
      COMPUTE sample = #Sample.
      COMPUTE measure = RV.NORMAL(0,7).
      END CASE.
    END LOOP.
  END LOOP.
  END FILE.
END INPUT PROGRAM.
EXECUTE.

AGGREGATE
  /OUTFILE=*
  /BREAK=sample
  /mean = MEAN(measure) /sd = SD(measure).
COMMENT Change the 1 in the next line to one less than the number in each group.
COMPUTE ss = (sd**2)*1.
COMMENT Change the 2 in the next line to the number of groups.
COMPUTE sample = TRUNC(($CASENUM-1)/2).
EXECUTE.

AGGREGATE
  /OUTFILE=*
  /BREAK=sample
  /sdMeans = SD(mean) /ssWithin = SUM(ss).
COMMENT Set the 2 on the next line to the degrees of freedom from within.
COMPUTE varwithn = sswithin/2.
COMMENT Set the final 2 on the next line to the number in each group.
COMPUTE varBetwn = (sdMeans**2)*2 .
EXECUTE.
COMPUTE F = varBetwn/varwithn.
EXECUTE.

Other Distributions and Other Questions

Does it matter what the mean and/or standard deviation of your original distribution is?

Each F-value that was calculated could have been used in a significance test in an ANOVA. For each of those tests, was the null hypothesis true or false? When the calculations found an F that was greater than 20, was the null hypothesis true or false? If an F above 20 were found, would we reject the null hypothesis?

What happens if you use another distribution? For example, you could replace the formula for the observations with rv.binomial(1,.5). Does that affect the distribution of F? Could you say something about how to run an ANOVA the easy way? Through the Analyze pull-down menu? Is there any particular reason to find F-values through Syntax? If so, let us know, or people will just use the easy way.


©2008 Key College Publishing. All rights reserved.