Data Matters with Excel®
Activity 2.1
Section 2.1 claims that a random sample of a population will tend to be roughly representative. This is more so for larger random samples. The claim applies to all of the aspects of the population. Without paying attention to the number of men and women in the population, random sampling tends to produce a sample that has roughly the same proportions of men and women as the population. This applies to any aspect of the population.
This claim sounds dubious. In fact, it is so counterintuitive that statisticians who should know better have lost confidence in it. In this project, you will check whether this claim is true (and what I mean by roughly representative will become clear).
Starting with a population of 50,000 people, you will find out what the proportions are in that population and take some random samples. The claim is that the samples will have roughly the same proportions as the population. Lets see.
For this project (and some later ones), you will need a copy of the file RepUSSample.xls.
The data file includes data on 50,000 Americans surveyed in March 2001. These 50,000 Americans are a roughly representative sample of Americans. For this project, you are going to think of them as a population, as 50,000 people who live in a town.
Open RepUSSample.xls. The first column in the data is labeled ID_Number. It is an identification number for each person. You are going to take random samples of the population, and you can tell who was selected by looking at their ID numbers.
The third column is labeled Education. It shows the highest level of education each person has obtained. For example, the first person (ID number 1) is a high-school graduate who did not go to college at all. (Education was recorded only for people who were more than 15 years old, which you can tell by looking at the second column for age.)
We want to know about the population. Find the proportions of people with each level of education. Click on C at the top of the third column to select the Education data. Select Data, PivotTable and PivotChart Report. A dialog box pops up. Click on Next, Next, Layout. Drag and drop Education onto Row. Drag and drop Education onto Data. Double-click Count of Education. Click on Options. Click on the triangle near Show data as: and select % of column. Click OK, OK, Finish. The PivotChart appears on a new worksheet. To get back to the data, click on the Data tab at the bottom of the screen.
Find and record several proportions from the population. Include the proportion who have high-school diplomas, the proportion who are under 16, and the proportion who have bachelors degrees. To make sure you are getting the proportions right, include the gender proportions in the proportions you collect. You should have 50.96% female.
Select Insert, Worksheet. Double-click on the tab of the new worksheet at the bottom of the page. (The tab says something like Sheet1.) Type in a new name for the sheet: Sample.
Start by copying the column labels from the Data worksheet to the Sample worksheet. To do that, on the Data worksheet, right-click on the 1 at the top of the gray column on the left. The row of labels will darken to indicate that they are selected. Left-click on Copy on the little drop-down menu that appears. Click on the tab at the bottom of the screen for the Sample worksheet. Right-click on the 1 in the gray column and left-click on Paste. You should now see the column labels in the top row of Sample.
You are going to use an Excel macro to take a small random sample of the population. Select Tools, Macro, Visual Basic Editor. (Visual Basic is Excels macro language.) Click on Insert, then Module. A white space appears. You will type your program into that white space.
Here is the program you will use.
Sub takeSample()
Randomize Timer
For SampleRow = 2 To 11
dataRow = 2 + Fix(50000 * Rnd)
Worksheets("Data").Rows(dataRow).Copy
Worksheets("Sample").Rows(SampleRow).PasteSpecial
Next SampleRow
End Sub
|
Sub takeSample()
Sub takeSample() tells the Visual Basic macro editor that the macro is beginning and that the macros name is takeSample. The parentheses are provided in case the macro has to be one that is sent information. TakeSample is just the name picked for this program. It means nothing to Excelyou could just as well call yours blahblah, by typing Sub blahblah().
Randomize Timer
Randomize Timer sets the Visual Basic random number generator to start at a random place in the series of numbers it produces. Without Randomize Timer, the random number generator produces a random series of numbers, but they are the same random series every time. It would be as though someone had a printed table of random numbers, and they started on the first number in the printed table every time.
For SampleRow = 2 To 11
This is a For-To statement. A For-To statement tells Visual Basic to carry out the upcoming instructions a number of times, once for each value. SampleRow is the name chosen for a variable that will hold the numbers referring to rows in the Sample worksheet. In the case of this For-To statement, the instructions are completed once for when SampleRow equals 2, then again for when SampleRow equals 3, and so on until it does it a last time with SampleRow equaling 11.
dataRow = 2 + Fix(50000 * Rnd)
This instruction uses the random number generator (Rnd) to pick a random number between 2 and 50,001. DataRow is the name chosen for a variable that will hold numbers referring to rows in the Data worksheet. It could as easily have been named dr or rowInTheDataWorkSheet.
In the worksheet, you remember, RAND() gives a random number. But in the macro editor, Rnd produces the random numbers. Rnd produces a random number between 0 and .99999. When this is multiplied by 50,000, we get a random number between 0 and 49,999.999. Fix( ) strips off the decimal part, so we are left with a random integer between 0 and 49,999. We add 2 and set dataRow to equal the sum. In this way, dataRow is set to an integer between 2 and 50,001, and the system works so that every integer in that range is equally likely to be used for dataRow.
This is the random sampling step. There are 50,000 rows of data in the Data worksheet, one row for each person. This part of the macro picks a random number in such a way that every row (from 2 to 50,001) is equally likely to be selected for inclusion in the sample. Thats random sampling.
In this project, we are using sampling with replacement. This means that, as the random sample is selected, every person in the population has an equal chance of being selected every time someone is selected, even if they have already been selected. For this reason, sampling with replacement can select the same person one, two, or more times for the same sample. It is called with replacement because it is as if a selected person were put back (replaced) in the population before the next selection.
Most research is done without replacement, but the effect of random sampling works both with and without replacement. (It just works more simply without replacement.)
Worksheets(Data).Rows(dataRow).Copy
This instruction copies the data in the randomly chosen row of the population worksheet. Worksheets(Data). tells Excel to look on the Data worksheet. Rows(dataRow). tells Excel to look at the row that matches the random number between 2 and 50,001 that we selected earlier. Copy tells Excel to copy that row of data.
Worksheets(Sample").Rows(SampleRow).PasteSpecial
This instruction pastes the copied data into the next row in the Sample worksheet. Worksheets(Sample"). tells Excel to look at the Sample worksheet. Rows(SampleRow). tells Excel to look at the row indicated by SampleRow. PasteSpecial pastes the data into that row.
Next SampleRow
Next SampleRow tells Excel that the For-To instructions are over. If there is another value for the For-To instruction to work with, then SampleRow is increased by 1 and Excel goes back to the first instruction after the For-To statement, which is to pick a random population row.
End Sub
End Sub indicates the end of the macro.
To run the macro, click on the black triangle at the top-center of the screen.
Get back into the Sample worksheet by clicking on the Excel button at the bottom of your screen. Look at the ID numbers in the random samples. Can you see any pattern in which people are chosen? You shouldnt be able toevery person in the population collection was equally likely to be selected for the sample each time.
Now we will take a random sample with 400 people in it. To do this, change the For-To instruction to For SampleRow = 2 To 401. Run the macro, and you will have a 400-person random sample of the population.
Earlier you learned some of the proportions in the population of 10,000 people. Click anywhere on the worksheet, then use a PivotTable to get those same proportions for the 400-person random sample. Because the sample is a random sample, its proportions should be roughly the same. By roughly in this case, I mean within 5%. For example, the proportion in the sample who have a bachelors degree is unlikely to be more than 5% away from the populations proportion.
Record the proportions in the sample. Note whether the random samples are actually roughly representative of the population they were drawn from.
Play with this for a while. Try a few more samples. Note that the proportions always stay near the populations proportions.
The most important thing to note is that there are many proportions in this population and the random sampling creates a sample that roughly matches all of the proportions simultaneouslyand it does so without looking at those attributes. All it does is select people by a system that gives every person an equal chance of being selected for the sample.
Save Your Work
As you continue, you will be creating more macros. In later sections, you will use those macros, so save yourself some typing timesave your macros. You can copy and paste them into Word documents. Later you will be able to copy them from the Word documents back into the Visual Basic macro editor.
| ©2008 Key College Publishing. All rights reserved. |
|