General Information
  Home
Author Bio
Product/Purchase Info

Instructor Resources
Registration Required
  Register
Download Instructor Resources

Computer Activities
and Data Sets
  Table of Contents
Excel
Fathom
SPSS

Community
  Contact the Author
Ideas/Comments for Publisher
Testimonials
Coming soon!

Other Key Sites
  Key Curriculum Press

Key College Publishing

Data Matters with SPSS®

Activity 2.1

Section 2.1 claims that a random sample of a population will tend to be roughly representative. This is more so for larger random samples. The claim applies to all of the aspects of the population. Without paying attention to the number of men and women in the population, random sampling tends to produce a sample that has roughly the same proportions of men and women as the population. This applies to any aspect of the population.

This claim sounds dubious. In fact, it is so counterintuitive that statisticians who should know better have lost confidence in it. In this project, you will check whether this claim is true (and what I mean by “roughly” representative becomes clear).

Starting with a population of 50,000 people, you will find out what the proportions are in that population and take some random samples. The claim is that the samples will have roughly the same proportions as the population. Let’s see.

For this project (and some later ones), you will need a copy of the RepUSSample.sav.

The data in that collection are from 50,000 Americans surveyed in March 2001. They are a roughly representative sample of Americans, but more about that later. Right now, think of them as 50,000 people who live in a town.

The first column in the data is ID_Number. This is an identification number for each person. You are going to take random samples of the population, and you can tell who was selected by looking at their ID numbers.

The third column is labeled Education. It is the highest level of education each person has obtained. For example, the first person (ID number 1) is a high-school graduate who did not go to college at all. (Education was recorded only for people who were more than 15 years old, which you can tell by looking at the second column for age.)

You want to know about the population. To find the proportions of people with each level of education, select Analyze, Descriptive Statistics, Frequencies. A dialog box pops up. Select Education and click on the triangle in the middle to move Education into the Variable(s) box. Click OK. Look at the output. 21.8% are listed as N/A. Those are the kids who were under 16, whose education level was not recorded.

Record several proportions from the population. Include the proportion who have high-school diplomas, the proportion who are under 16, and the proportion who have bachelor’s degrees.

Now you will take a small random sample of the population. In the data editor, click on Data, Select Cases. A dialog box pops up. Click on the white dot next to Random sample of cases. Then click on the newly blackened Sample. Another dialog box pops up. Click on the white dot next to Exactly and enter 10 and 10000 so the sentence reads Exactly 10 of the first 10000 cases. Click on Continue and OK. SPSS puts a slash next to the cases that are not in the random sample.

Look at the ID numbers by getting the frequency of each ID number: Select Analyze, Descriptive Statistics, Frequencies. Double-click on ID_Number and click OK.

Repeat this process, selecting the random sample and getting the ID numbers several times. Can you see any pattern in which people are chosen? You shouldn’t be able to. Every person in the population collection was equally likely to be selected for the sample each time.

Now take a random sample with 400 people in it. Get into the random sampling dialog box and edit the sentence to read Exactly 400 of the first 10000 cases.

Earlier you learned some of the proportions in the population of 50,000 people. Get those proportions for the 400-person random sample. Because the sample is a random sample, its proportions should be roughly the same. By “roughly” in this case, I mean within 5%. For example, the proportion of those in the sample who have a bachelor’s degree is unlikely to be more than 5% away from the population’s proportion.

Record

Record the proportions in the sample. Note whether the random samples are actually roughly representative of the population they were drawn from.

Play with this a while. Try a few more samples. You can leave the proportion screens showing and watch them change. Note that they always stay near the population’s proportions. With SPSS, you rerun the Analyze command every time you take a new random sample. Don’t worry about the output window; it won’t change automatically.

The most important thing to note is that although there are many proportions in this population, the random sampling creates a sample that roughly matches all of the proportions simultaneously and does so without looking at those attributes. All it does is select people by a system that gives every person an equal chance of being selected for the sample.

Save the File

You will be using the Rep US Sample data many times in later projects. Save that data on your computer so you will be able to access it.


©2008 Key College Publishing. All rights reserved.