General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with Fathom! Dynamic Statistics™ software

Activity 2.1

Section 2.1 claims that a random sample of a population will tend to be roughly representative. This is more so for larger random samples. The claim applies to all of the aspects of the population. Without paying attention to the number of men and women in the population, random sampling tends to produce a sample that has roughly the same proportions of men and women as the population. This applies to any aspect of the population.

This claim sounds dubious. In fact, it is so counterintuitive that statisticians who should know better have lost confidence in it. In this project, you will check whether this claim is true (and what I mean by “roughly” representative will become clear).

Starting with a population of 50,000 people, you will find out what the proportions are in that population and take some random samples. The claim is that the samples will have roughly the same proportions as the population. Let’s see.

For this project (and some later ones), you will need a copy of the RepUSSampleMarch2001.ftm file.

The data file includes data about 50,000 Americans surveyed in March 2001. These 50,000 Americans are a roughly representative sample of Americans. For this project, you are going to think of them as a population.

In Fathom, open RepUSSampleMarch2001.ftm. (If you are unable to open this file, or if it takes a very long time to open, there are two smaller files containing smaller sample populations. You can use one of these smaller files for this project if you wish.) The first attribute is ID_Number. It is an identification number for each person. You are going to take random samples of the town’s population, and you can tell who was selected by looking at their ID numbers.

The third attribute is Education. It is the highest level of education each person has obtained. For example, the first person (ID number 1) is a high-school graduate who did not go to college at all. (Education was recorded only for people who were more than 15 years old, which you can tell by looking at the second column for age.)

We want to know about the population. Open the Estimate Parameters window (select Analyze, then Estimate Parameters). You want to get proportions, so select Empty Estimate, then Estimate Proportion. Now select and drag Education and drop it on Attribute (categorical): <unassigned>.

Record

Record several proportions from the population. Include the proportion who have high-school diplomas, the proportion who are under 16, and the proportion who have bachelor’s degrees.

Now you will take a random sample. Click on the collection. Select Analyze, then Sample Cases. A new collection appears, Sample of October 2001. Double-click on the sample collection. A sample collection has some specific features. Animation on controls whether the display is updated after each new person is added to the sample collection. Leave on Animation on. It will be helpful to see how things change with each additional person.

With replacement controls whether the random sampling routine can select the same person more than once. When With replacement is checked, a person might be randomly selected two or more times. When With replacement is not checked, each person can be selected only once. In a typical survey, people are not counted more than once. Usually surveys are done without replacement. Whether or not replacement occurs affects how closely the sample can be expected to approximate the population. You can play with it to see how it affects the samples you get.

Empty this collection first controls what happens each time you click on Sample More Cases. If Empty this collection first is checked, the sample is emptied before people are added to it. If Empty this collection first is not checked, then every time you click on Sample More Cases you add people to the sample, and the sample gets larger and larger. You want new samples. Make sure Empty this collection first is checked.

Collect new sample when source changes gets the sample collection to refresh itself every time the collection from which it is sampling changes. But at this point, you are not going to change the population collection, so this feature won’t matter to us.

Below Collect new sample when source changes, there is a text box next to cases. The text box holds a number that determines the number of cases that are to be included in the sample. The default value is 10 cases, and the sample collection starts with 10 cases.

Get a case table for the sample collection so you can see its data. Look at the ID numbers and click on Sample More Cases several times. Can you see any pattern in the people who are chosen? You shouldn’t be able to. Every person in the population collection was equally likely to be selected for the sample each time. Predicting which person is selected is like predicting which side will be up when you roll a die.

Increase the sample size to 400 (type 400 in the text box next to cases and click on Sample More Cases).

Earlier you learned some of the proportions in the population of 10,000 people. Will you get those same proportions for the random sample? Because the sample is a random sample, its proportions should be roughly the same. By “roughly,” in this case, I mean within 5%. For example, the proportion in the sample who have a bachelor’s degree is unlikely to be more than 5% away from the population’s proportion.

Record the proportions in the sample. Note whether the random samples actually are roughly representative of the population they were drawn from. Play with this a while. Try a few more samples. You can leave the proportion screens showing and watch them change. Note that they always stay near the population’s proportions.

The most important thing to note is that even though there are many proportions in this population, the random sampling creates a sample that roughly matches all of the proportions simultaneously and does so without looking at those attributes. All it does is select people by a system that gives every person an equal chance of being selected for the sample.

Save the File

You will be using the Rep US Sample data many times, in later projects. Save that data on your computer so you will be able to access it.


©2008 Key College Publishing. All rights reserved.