Data Matters with Fathom! Dynamic Statistics software
Activity 7.3
Section 7.3 suggests two ways to estimate the population variance from data in groups. One way looks at how the group means vary. The other way looks at how the observations within each group vary from the group mean. In this project, you will collect these estimates from many random samples from Rep US Sample. You can then explore the estimates and see whether they seem to work well.
Fathom has both estimates built in, but not labeled. To start, you need to calculate those estimates for two different data sets, then check them against what Fathom has built in, so as to make sure it does provide the two estimates.
The first dataset is:
- Group A: 1, 2, 6
- Group B: 4, 5, 6
Here are the calculations to find the two estimates:
Estimating Variance from Variation Within Groups
|
Group
|
Number
|
Group
Mean |
Deviation |
Squared
Deviations |
|
A
|
1
|
3 |
2 |
4 |
| A |
2 |
3 |
1 |
1 |
| A |
6 |
3 |
3 |
9 |
| B |
4 |
5 |
1 |
1 |
| B |
6 |
5 |
1 |
1 |
| B |
5 |
5 |
0 |
0 |
| Sum of Squared Deviations: |
16 |
| Degrees of Freedom: |
4 |
|
Estimated Variance:
|
4 |
|
Estimating Variance from Variation Between Groups
|
Group
|
Number
|
Group
Mean |
Overall
Mean |
Deviation |
Squared
Deviations |
|
A
|
1
|
3 |
4 |
1 |
1 |
| A |
2 |
3 |
4 |
1 |
1 |
| A |
6 |
3 |
4 |
1 |
1 |
| B |
4 |
5 |
4 |
1 |
1 |
| B |
6 |
5 |
4 |
1 |
1 |
| B |
5 |
5 |
4 |
1 |
1 |
| Sum of Squared Deviations: |
6 |
| Degrees of Freedom: |
1 |
|
Estimated Variance:
|
6 |
|
To see where Fathom reports the two estimates of the variances, drag a new case table onto the Fathom workspace and enter the above data into two attributes, Group and Number.
When youre done, the case table should look like this.
|
Group
|
Number
|
| A |
1 |
| A |
2 |
| A |
6 |
|
B
|
4
|
|
B
|
6
|
| B |
5 |
Once that data is entered, select Analyze, Test Hypothesis, Empty Test, Analysis of Variance. Drag Number onto Response attribute and Group onto Grouping attribute.
Fathom returns this table.
|
Source of
variation
|
df
|
Sum of
squares |
Mean
square |
|
Groups
|
1
|
6 |
6 |
|
Error
|
4
|
16 |
4 |
The two estimates of variance appear on the right. The top row has the variance estimated from variation between the groups. The bottom row has the variance estimated from variation within the groups. The table does not line up, but Fathom is calling the variance estimates the Mean Square and puts the variation between the groups in a Groups row. Fathom puts the variation within the groups in an Error row.
So that you can be sure the analysis of variance (ANOVA) provides these two estimates, here is another dataset that you can enter and get the estimates for.
- Group A: 10, 30
- Group B: 0, 60
- Group C: 30, 50
Here are the calculations to get the two variance estimates.
Estimating Variance from Variation Within Groups
|
Group
|
Measure
|
Group
Mean |
Deviation |
Squared
Deviations |
|
A
|
10
|
20 |
10 |
100 |
| A |
30 |
20 |
10 |
100 |
| A |
0 |
30 |
30 |
900 |
| B |
60 |
30 |
30 |
900 |
| B |
30 |
40 |
10 |
100 |
| B |
50 |
40 |
10 |
100 |
| Sum of Squared Deviations: |
2200 |
| Degrees of Freedom: |
3 |
|
Estimated Variance:
|
733.3333 |
|
Estimating Variance from Variation Within Groups
|
Group
|
Measure
|
Group
Mean |
Overall
Mean |
Deviation |
Squared
Deviations |
|
A
|
10
|
20 |
30 |
10 |
100 |
| A |
30 |
20 |
30 |
10 |
100 |
| A |
0 |
30 |
30 |
0 |
0 |
| B |
60 |
30 |
30 |
0 |
0 |
| B |
30 |
40 |
30 |
10 |
100 |
| B |
50 |
40 |
30 |
10 |
100 |
| Sum of Squared Deviations: |
400 |
| Degrees of Freedom: |
2 |
|
Estimated Variance:
|
200 |
|
Enter this data into the case table and Fathom will update the analysis of variance. Check that the table really includes the two estimates.
We are going to collect the estimates as samples. To see how thats going to work, select the Test Hypotheses box, Analyze, Collect Measures. Drag a case table onto the workspace to see what measures are collected and where the two variance estimates will appear.
In the Measures Collection, the variance from variation between the groups is called ms_error. The variance from variation within the groups is called ms_treatments.
Now that you know where the variance estimates will be, we can take samples and see how these estimates do.
Open Rep US Sample. Select Analyze, Sample Cases. Inspect the sample collection to set sample size and Animation as you like. We will be taking samples and need each sample to have multiple categories and to have at least two observations in at least one category. Youll need at least three observations in each sample. You will be able to try different sample sizes.
Drag a case table onto the workspace so you can see the attributes in the Sample Collection.
Select Analyze, Test Hypothesis, Empty Test, Analysis of Variance. Drag a numeric attribute onto Response Variable and a categorical attribute onto Grouping Attribute.
Select Analyze, Collect Measures. Drag a case table onto the workspace so you can see the variables in the Measures Collection.
Drag two new graphs onto the workspace and drag ms_error and ms_treatments into the graphs.
Inspect the Measure Collection to set the number of samples and Animation as you like.
Select Analyze, Estimate Parameters, Empty Test, Estimate Mean. Drag ms_error and ms_treatments onto the Estimate Parameters box to get their means. Drag your number-line attribute from the population data (the original collection) onto the Estimate Parameters box to get its standard deviation. Square the standard deviation to get its variance.
How do the estimated variances compare with the real variance of the population? Are the claims in Section 7.3 sensible?
Try different sample sizes. Would you like to propose any warnings for these variance estimates?
| ©2008 Key College Publishing. All rights reserved. |
|