Data Matters with Excel®
Activity 7.1
If you were thinking of doing a z-test of a sample mean, you might subtract that mean from the population mean you were interested in and divide by the standard error that you estimate from the samples standard deviation. Section 7.1 claims that what you would get is something that is not normally distributed, but distributed more like a narrow volcano.
In this project, you will test this claim. To start, you will take samples with two observations each. Youll be taking the samples from a normal distribution with a mean of 0 and standard deviation of 1. For each of several hundred samples, you will calculate a statistic that is the sample mean divided by the standard error estimated from the sample standard deviation.
Open a new worksheet. Label column A Sample. In cells A2 and A3, enter the formula =normsinv(rand()) . You know what rand() does: It returns a random number from 0 to 1. Normsinv() produces numbers from a standard normal distribution. A standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Normsinv() takes a proportion (represented as a decimal) and returns the z-value from a normal distribution that has that proportion below it. For example, normsinv(.5) is 0, and normsinv(.025) is 1.96. So normsinv(rand()) produces random observations from a normal distribution with a mean of 0 and standard deviation of 1. (Dont believe it? Get a lot, then get a histogram of them.)
Label column B Mean; column C Standard Deviation; column D Standard Error; column E t ; and column F t-values. This table shows the columns, their labels, and their formulas.
|
Column
|
B |
C |
D |
E
|
|
Label
(in row 1)
|
Mean |
Standard Deviation |
Standard Error |
t
|
|
Formula
(in row 2)
|
=average(A2:A3) |
=stdev(A2:A3) |
=C2/sqrt(2) |
=B2/D2
|
Because we are thinking of these t-values as part of a test that the population mean is 0, the formula for t is really =(B2-0)/D2, but theres not really any reason to include subtracting 0.
Now you will use a macro to take new samples and store the t-values in column F (the sixth column).
Sub takeSamples()
For SampleRow = 2 TO 501 Change 501 to get as many samples as you like.
Calculate
Cells(SampleRow,6).Value = Cells(2,5).Value
Next SampleRow
End Sub
|
You can get a histogram of the t-values (Tools, Data Analysis, Histogram, and so on).
What do you think about Gossets claim, discussed on page 392 in the text? Does it look as if you could do a regular z-test with those statistics?
Sort the statistics and check the value that is 2.5% up the list. With two observations, there is one degree of freedom, and Gosset claimed that 95% of the t-values would fall between 12.7 and 12.7. Did that work for your data? Try 4,000 samples. Do 12.7 and 12.7 work with 4,000 samples? Why does it make a difference how many samples you take?
If you used 2 and 2 as your cutoffs for significance, how often would you have rejected the true null hypothesis?
Try other population standard deviations and other sample sizes. Do they make a difference?
| ©2008 Key College Publishing. All rights reserved. |
|