Data Matters with Fathom! Dynamic Statistics software
Activity 10.2
Violations of Pearsons assumptions of constant variance and normality can wreck Pearsons correlation test. How much can they mess it up? In this project, you will find out.
You will set Fathom to perform significance tests for correlation when the null hypothesis is true, then keep track of how often the true null hypothesis is rejected. You will do this by first setting up Fathom to test distributions, then trying a variety of distributions to see how they do.
Setting Fathom to Generate
Drag a case table onto the workspace. Add two attributes, x and y. Right-click on each attribute to set its function. Use the formula randomNormal(0,1) for both. Later you will change that to mess up the Pearson testing.
Drag a new hypothesis test onto the workspace. Select Empty Test, Test Correlation. Drag x and y onto the Test Correlation attributes.
Select Analyze, Collect Measures. Drag a case table onto the workspace. The first column is pValues. Add an attribute, Significant, using the formula pValue<.05. Drag a new estimate onto the workspace, then select Empty Estimate, Estimate Proportion. Drag Significant onto the attribute of the Estimate Proportion window. Now you can see what proportion of the samples lead Pearsons correlation test to reject the true null hypothesis.
Double-click on the Measures Collection to set your number of samples and Animation as you like. Use a number of samples that are above 400.
Try Other Distributions
Get back to the original case table. You can change the sample size.
Try editing the formulas for x and y. There are menus listing all of the formulas at your disposal. To get Raised to the power, click on ^. The only restriction is that the formula for x cannot refer to y, and the formula for y cannot refer to x. At the core of every formula, there has to be a random number. At any point, you can click on Apply to see what kinds of numbers you get.
When you have a formula that you would like to test, get into the Measures Collection and click on Collect More Measures. Heres one that gets true null hypotheses rejected 16% of the time (at alpha=5%): randomCauchy()4.
How do the violations of Pearsons assumptions affect the Pearson correlation test?
Testing Independence
Pearsons correlation test also assumed independence. Independence means you cant predict an attributes observation from the other observations of that attribute. When independence is violated, Pearsons correlation test can be very unreliable. You may have seen that if you used the prev() function in the first part of this project.
For this project, we need only a case table and a scatter plot of the data in the case table.
The case table has two variables. The formula for x is caseIndex. The formula for y is conditional. Type If (caseIndex=1 . Click on the top question mark and enter 0 . Click on the bottom question mark and enter prev(y) + randomNormal(0,1).
That equation for y sets y to have a random shift. On the average, the changes in y will equal 0. On the average, there is no trend and no correlation with x.
Drag a new graph onto the workspace. Drag x to the x-axis and y to the y-axis. Select Scatter Plot, Line Scatter Plot.
Press Ctrl-Y to collect a new sample. Right-click on the scatter plot and select Rescale Graph Axes. Repeat these steps to see what happens.
In this part of the project, there are violations of independence. What effect would a lack of independence have on regression analysis?
Most analysis of time-related data, like stock prices, shows that observations are not independent. For example, todays stock price is predictable from yesterdays stock price. How does that affect correlational studies that try to predict something from the date?
| ©2008 Key College Publishing. All rights reserved. |
|