Data Matters with SPSS®
Activity 2.2
Section 2.2 claims that if you create random samples from some population and create a histogram of the proportions in the samples, the histograms will show shapes that are roughly bell-shaped. The bell-shaped histograms will have centers at roughly the probabilities, or population proportions, involved in generating the samples proportions. In this project, you will see for yourself whether that is true.
The project in Section 2.2 requires these steps.
- Set the probability where you would like it, set the sample size where you would like it, and use the random number generator to simulate taking samples.
- Record the proportions that appear in the samples and make a histogram of those proportions. (Check whether the histograms are roughly bell-shaped with centers at the probability that you chose.)
- Repeat the process with larger samples. Youll notice that your histograms are narrower with larger samples. Thats the law of large numbers.
Heres how to do each step.
Step 1: Set the probability where you would like it, set the sample size where you would like it, and use the random number generator to simulate taking samples.
You are going to create a sample of random draws and calculate the proportion. You will do this first with the SPSS graphical interface, then with SPSSs programming language, SPSS Syntax. Learning to use Syntax is valuable because you will be taking the samples many times, and being able to do that automatically with Syntax saves a lot of work. (If you are working in an older version of SPSS, Syntax is not available, so you will use the graphical interface to make samples and calculate proportions.)
To create a sample using the SPSS graphical interface, get into the data editor and enter a value in the 10th row or below to let SPSS know that you want data that fill up the rows at least that far. Select Transform, Compute. Enter a name for your variable, draw, then click on the textbox labeled Numeric Expression: . Enter RV.BINOMIAL(1,.5) into that box. (Note: I am capitalizing that so you can tell the difference between the variable names that I made up and the special SPSS commands.) RV.BINOMIAL fills a variable with random numbers created by simulating slips of paper being pulled from a hat. Each slip of paper has a 1 or a 0 on it. RV.BINOMIAL sets the variable to the sum of the numbers drawn from a hat. For example, if 1, 0, and 1 are drawn, RV.BINOMIAL sets the variable to 2.
The numbers in the parentheses determine how RV.BINOMIAL goes about this. The first number is the number of draws. When the number of draws is set to 1, RV.BINOMIAL returns 1s and 0s. If you would like, experiment to see what you get with larger numbers of draws.
The second number is the probability of a 1. For example, RV.BINOMIAL(1,.9) is like one draw from a hat that has nine slips of paper with a 1 and one slip with a 0.
Click OK. Look at the data. There is a variable, draw, that has 1s and 0s. Get the proportions of each by selecting Analyze, Descriptive Statistics, Frequencies. Select draw, click on the triangle in the middle, and click OK.
SPSS is reporting that a particular proportion of the sample were 1s. Look back at the data. Is SPSS correct? Did anything go wrong? (I doubt it would, but checking helps you get clear what SPSS is doing.)
Syntax
Now youre going to take another sample, but this time youll do it in a way that involves less clicking and selecting. Click on File, then select New, Syntax. A new window opens labeled SPSS Syntax Editor.
The Syntax Editor allows you to use SPSSs programming language. Heres your first program. It fills up draw with 1s.
COMPUTE draw = 1.
EXECUTE.
|
Note: SPSS Syntax indicates a command by ending it with a period, so the periods are important.
Enter this program, then run it by clicking on the triangle at the top of the Syntax Editor. Look at your data to see what happens. (If you get errors when you click on the triangle to run a program, choose Run all from the Run menu.)
Here is what this Syntax says: COMPUTE is the same as the Compute under the Transform menu. It tells SPSS to calculate values for a variable.
SPSS wont finish the work of COMPUTE until it gets to an EXECUTE statement. This is because sometimes you have a lot of COMPUTEs that could be done at the same time and it speeds things up if SPSS does them together.
But a bunch of 1s is not very helpful, so edit the program to this.
COMPUTE draw = RV.BINOMIAL(1,.5).
EXECUTE.
|
Run the program. Now draw is filled with random draws from RV.BINOMIAL(1,.5).
Input Program
Whenever we get started creating simulations with SPSS, we have to begin by entering a number in a row a ways down to let SPSS know that we want some observations. Thats sort of a hassle. There is an alternativeits somewhat of a hassle the first time you type it in, but easier thereafter.
This syntax tells SPSS to dump the data that is currently in the data editor and insert 10 lines with a single variable, draw, that it takes from RV.BINOMIAL(1,.5).
INPUT PROGRAM.
LOOP #Case = 1 to 10.
COMPUTE draw = RV.BINOMIAL(1,.5).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
|
Note that the indentations indicate tabs, which are essential in some environments. (In SPSS Syntax Editor, it is important to line up the beginnings and endings of commands with tabs. In the example above, INPUT PROGRAM. needs to align with END LOOP. The measure of the tabs doesnt matter, but the tabs must align at the beginning and end of a command in order for Syntax Editor to run the program properly.
Here is what the lines above in this Syntax mean. Notice that they all end in a period, so they are all commands.
INPUT PROGRAM. tells SPSS that the following commands down to END INPUT PROGRAM. are to be used to create a data file.
LOOP #Case = 1 to 10.#Case is a variable that I named. It could have been #obs or #blah. In SPSS Syntax, variables start with the number sign and cannot be more than eight characters long. This line says, Set Case to 1 and do the steps below all the way down to where it says END LOOP. Then add 1 to Case and start here again. Repeat the steps 10 times.
COMPUTE draw = RV.BINOMIAL(1,.5). is the same command used earlier. It fills draw with draws from RV.BINOMIAL.
END CASE.Each line in an SPSS data file is called a case. This command tells SPSS that we are done creating variables for each case. In this program, we are creating only the variable draw, but we could be creating more variables. All variables have to be computed before END CASE.
END LOOP. is the second part of LOOP #Case = 1 to 10. It tells SPSS that its time to add 1 to Case and go back to the command that follows LOOP #Case = 1 to 10.
END FILE. goes with INPUT PROGRAM. It tells SPSS that you are done putting things into the data file.
END INPUT PROGRAM. also goes with INPUT PROGRAM. It tells SPSS not only that you are done adding to the data file but also that you dont want to do anything else to it. Youre finished.
Run the program to see what it does. Change the 10 in LOOP #Case = 1 to 10. to another value and run the program to see what happens. Try it with several different values.
So far, so goodbut youre still stuck calculating the proportions from each sample. To make things easier, you are going to have SPSS calculate as many samples as you would like. Youll indicate the samples with a second variable, sample, then use SPSS to calculate the proportions in each sample and come up with a data set containing the proportions. That will allow you to get a histogram and see how the proportions are spread out.
The goal is a data table that looks something like this one.
|
Sample
|
Draw
|
|
1
|
0
|
| 1 |
1 |
| 1 |
0 |
| 2 |
1 |
| 2 |
0 |
| 2 |
1 |
| 3 |
1 |
|
3
|
0
|
| 3 |
0 |
In this table, the first sample is 0, 1, 0. The second is 1, 0, 1. The third is 1, 0, 0. The first column indicates which sample is which. The difference between this table and what youre looking for is that this table has only three draws per sample, and you will use more draws per sample.
Heres the program that will do the trick. This is the same as a program you already entered except for new lines. The new lines are in bold.
INPUT PROGRAM.
LOOP #Sample = 1 TO 5.
LOOP #Case = 1 to 10.
COMPUTE sample = #Sample.
COMPUTE draw = RV.BINOMIAL(1,.5).
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
|
One change is that a second loop has been added. Now the program sets a variable called Sample to 1, then loops through Case equaling 1, 2, 3, 4, and so forth. After Case has equaled 10, the program goes back and sets Sample to 2, sets Case to 1, then cycles through Case equaling 1, 2, 3, 4, and so on until after Sample has equaled 5.
Another change is that you are now creating two variables, sample, which indicates which sample each row is a part of, and draw.
The third change is the END LOOP. command for the new loop. (Every loop needs an END LOOP. command.)
Because this program is inputting new data into the data editor, it needs the data editor to be empty. So if the program doesnt seem to run, get into the data editor, click on File, then select New and Data and try again.
Run the program and look at what it does. It is set to create 5 samples, and 10 observations in each sample. Is that what you got? It should be. Try changing the number of samples and the number of observations in each sample. Does everything look right? It should all work. I ask you to check because checking helps you understand what the program is doing.
Aggregate
If thats all you could do, this programming would not be very helpful. Heres the powerful step in all of this.
Save your data file so you can get back to it if you want. Then calculate the proportion of 1s in at least the first two samples of your data so that the information is there if you want to check back.
The next program calculates the proportions for each sample and creates a new data set with two columns. The first column will list the samples and the second column will have the proportions. Once we have that data set, you can get a histogram of the proportions to see if they fall where they were claimed to fall.
Get a clean Syntax editor for a new program, then enter this program.
AGGREGATE OUTFILE = *
/BREAK = sample
/percent1 = PIN(draw, 1,1).
|
As far as SPSS is concerned, there is only one command in this program. Thats why there is only one period. This is what each line means.
AGGREGATE OUTFILE = *AGGREGATE summarizes data and puts the summaries into a new data file. If you want the new data file to be in a new file on your computer, you type its name after OUTFILE = . I put in an asterisk (*). The asterisk tells SPSS to put the new data file directly into the data editor, replacing the data that had been there. (Thats why I had you save your data file before running this program.)
/BREAK = sample/BREAK tells SPSS both to do its summarizing in groups and how those groups are defined. In this case, the groups are defined by sample.
/percent1 = PIN(draw, 1,1).The only command in this program says, Create a variable, percent1, and fill it with the proportion of draw that are 1s. PIN(draw,1,1) is the proportion of the observations in draw that are between 1 and 1that is, the proportion of 1s (as opposed to 0s.) PIN(draw,1,9) would be the proportion of the observations in draw that are between 1 and 9. PIN(sample,3,7) would be the proportion of the observations in sample that are between 3 and 7.
Run the program and look in the data editor to see what you have.
Step 2: Record the proportions that appear in the samples and make a histogram of those proportions. (Check whether the histograms are roughly bell-shaped with centers at the probability that you chose.)
To get a histogram of the proportions, select Graphs, Histogram. Double-click on percent1 and click OK.
If you left the number of samples at 5, I bet your histogram isnt very bell-shaped. Edit the first program so that the LOOP #Sample command looks like this.
LOOP #Sample = 1 to 50.
Run both programs again and get a histogram again. Now you can see a bell-shaped histogram, I bet. Jot down for yourself how wide the histogram is. What proportion is at the bottom (left) of the histogram, and what proportion is at the top? Wheres the center? Is it at the probability we were using in RV.BINOMIAL?
Step 3: Repeat the process with larger samples. Youll notice that your histograms are narrower with larger samples. Thats the law of large numbers.
To get larger sample sizes, edit the program so the LOOP #Case command looks like this.
LOOP #Case = 1 to 100.
Rerun the programs and get a histogram. Now how wide is the histogram? You used a larger sample size. Is the histogram narrower?
Take some time to play with the probability of getting a 1, adjusting the number of samples and observations in each sample. When can you count on a nice-looking bell curve? When can you count on not getting a bell curve?
Save Your Work
You will be using the Syntax programs youve created so far in later sections. To save yourself some typing time, save all your Syntax codes.
| ©2008 Key College Publishing. All rights reserved. |
|