Data Matters with SPSS®
Activity 10.2
Violations of Pearsons assumptions of constant variance and normality can wreck Pearsons correlation test. How much can they mess it up? In this project, you will find out.
You will set SPSS to perform significance tests for correlation when the null hypothesis is true, then keep track of how often the true null hypothesis is rejected. You will do this by first setting up SPSS to test distributions, then trying a variety of distributions to see how they do.
Setting SPSS to Generate
This Syntax program will create 1,000 samples. Each sample will have an x-value and a y-value for each of four observations. In all of the samples, the null hypothesis will be true. At this point, the Syntax meets Pearsons assumptions: All of the observations are drawn from a normal distribution. Later you will alter that and see how that affects Pearsons correlation test.
INPUT PROGRAM.
LOOP #Sample = 1 TO 1000.
LOOP #Case = 1 to 4.
COMPUTE sample = #Sample.
COMPUTE <i>X</i> = RV.NORMAL(0,1).
COMPUTE <i>Y</i> = RV.NORMAL(0,1).
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
|
Now you are going to get SPSS to work with one sample at a time, and youll save the results of a regression test to a new file that has one p-value per sample.
Click on Data, Split File. Click on Organize Output by Groups. Double-click on Sample. Click OK.
Click on Analyze, then select Regression, Linear, Save, Coefficient Statistics. Click on File, then enter a name of a file and put it in a location where you will be able to find it again. Click on Continue. Select Y and click on the black triangle next to the Dependent box. Select X and click on the black triangle next to the Independent(s) box. Click OK.
Find the output file you saved and double-click on the file icon to open it.
We want only the SIG rows, so click on Data, Select Cases. Select If condition is satisfied and If. Double-click on rowtype_ and enter =SIG .
Click on Continue, Deleted, OK.
The correlation p-values are listed under X. Use Transform, Compute to calculate a variable with the formula x < .05 . Use Analyze, Descriptive Statistics, Frequencies to get the frequency of p-values that were less than 5%.
Try Other Distributions
Try other formulas to create the x-and y-values. Go back to the Syntax program and see if you can create distributions for x and y that lead SPSS to reject these true null hypotheses more often than 5% of the time. To get a list of functions that you can use to generate and modify values, in the data editor, click on Transform, Compute, and a scrollable list appears.
The only requirement is that the y-values cannot be built from the x-values, and the x-values cannot have the y-values in their formulas either. That way, they are independent, and the null hypothesis is true.
Rerun the steps to see how often Pearsons correlation test rejects the true null hypotheses.
How do the violations of Pearsons assumptions affect the Pearson correlation test?
Testing Independence
Pearsons correlation test also assumed independence. Independence means that you cant predict an attributes observation from the other observations of that attribute. When independence is violated, Pearsons correlation test can be very unreliable. If you used the lag() function in the first part of this project, you may have seen that.
For this project, you need only some data and a scatter plot.
Open a new, empty data sheet in the data editor and add a number near the 50th row. Use Transform, Compute to create a variable, X whose numeric expression is simply $CASENUM. Then add a variable Y and enter 100 for its first observation. Then use this Syntax program to fill in the rest.
IF ($CASENUM > 1) y = LAG(y) + RV.NORMAL(0,1).
EXECUTE.
|
With this program, for every case after the first, y equals the previous y-value plus a random number. The random numbers have a mean of 0, so overall, there is no trend in y. It just moves up and down randomly.
Click on Graph, then select Scatter, Define. Assign x to the x-axis and y to the y-axis and click OK.
Rerun this program and get new scatter plots until you understand that on the average, there is no trend in the y-values.
In this part of the project, there are violations of independence. What effect would a lack of independence have on regression analysis?
Most analysis of time-related data, like stock prices, shows that observations are not independent. For example, todays stock price is predictable from yesterdays stock price. How does that affect correlational studies that try to predict something from the date?
| ©2008 Key College Publishing. All rights reserved. |
|