![]() |
|
Workshop Statistics: SPSS® Companion Manual |
|
Guide for Instructors SPSS Companion AddendumWilliam C. Rinaman, Le Moyne College This documents serves as an addendum to the Workshop Statistics Guide for Instructors, covering issues that are specific to the SPSS Companion Manual for the text. General Comments:
Topic-by-Topic Notes:
Topic 1 Data and VariablesSPSS is not introduced until Topic 2, so students will not be using it in this topic. However, you will probably want to save the signature data (length, height, letters, and gender) from question 3 of the Preliminaries for students to use in creating the SIGNATURES.SAV file in Activity 2-12. You might also want to save the table of responses produced in question 6 of the Preliminaries as an SPSS file for future use. Topic 2 Data, Variables, and SPSSStudents are introduced to using SPSS for the first time in this topic. Instructions cover entering data (with the Data view and the Variable view), creating plots (with Graphs), and calculating new attributes (with Transform > Compute). Activity 2-1 introduces students to the use of SPSS. Data entry is accomplished using an empty Data view. Many students are already comfortable with entering data into spreadsheets. When students first enter something into an empty Data view, SPSS generates a dummy variable name. Students are then shown how to use the Variable view to name variables and set the correct attributes. Remind students to check that each variable has the correct type. For example, a Nominal variable cannot be used when a Scale variable is expected. Student should save the data once it has been entered. If your students need special instructions on where to save data at your installation, now is the time to provide it. This activity also includes the first look at SPSS graphs. SPSS does not draw dot plots, so students are introduced to histograms earlier than they are in the main text. Take some time to explain how to interpret a histogram. The final important SPSS feature to be introduced in this activity is Transform > Compute. This is the primary tool for creating new variables that are functions of existing variables. The most common mistakes that students make when using this feature are not naming a target variable and typing in variable names in the Numeric Expression box. Encourage them to use SPSS’s Selection feature to place variable names in the Numeric Expression box to avoid SPSS not recognizing a variable name. In Activity 2-2 students get their first opportunity to use an existing data set in a SPSS document (GENPHYS.SAV). You will need to direct students to where the SPSS documents for your course are stored. This activity reinforces the use of Transform > Compute to calculate values for a new attribute and the procedure to create a histogram, while also introducing the Sort procedure (Data > Sort cases). Note that sorting by one variable automatically rearranges all of the variables, leaving each case intact. Activity 2-3 gives more practice using Transform > Compute to create new variables from existing variables. Suggest that students save their final data set (under a different name and/or location) at the conclusion of the activity, for later reference. Activity 2-4 contains a more complicated use of Transform > Compute, using the logical operators to create a 0-1 binary variable, where 1 represents states over 25% and 0 represents 25% or less. Encourage students to get in the habit of assigning value labels to categorical variables. Students then see how they can generate side-by-side histograms for one attribute (SAT average) when the data are grouped by a second attribute. Homework Activities 2-5 through 2-12 let students practice their newly acquired SPSS skills. Most introduce students to additional existing data sets, while two (2-5 and 2-12) require students to create the data set with their own data. Topic 3 Displaying and Describing DistributionsActivity 3-4 introduces the stemplot. SPSS produces stemplots in Analyze > Descriptive statistics > Explore. Activity 3-5 further introduces the histogram. This activity also gives students a chance to manipulate a histogram using the SPSS Chart editor. The menu system here is complicated. Homework Activities 3-8, 3-10, 3-16, and 3-18 ask students to create histograms for new data from SPSS files, while 3-12 and 3-19 allow them to use their own data. Topic 4 Measures of CenterActivity 4-2 introduces Analyze > Descriptive statistics > Explore as the primary SPSS tool to compute summary statistics. This activity also demonstrates, in 4-2(c), how students can obtain summary statistics for subgroups determined by a categorical variable. Note that the SPSS Save files for 4-2(f)-(i) are identified in the table following 4-2(i). Activity 4-3 shows students how to delete cases and observe the effects on the mean and median. SPSS is needed for Homework Activities 4-9, 4-11, and 4-16 and may be useful for 4-6, 4-7, 4-12, 4-13, and 4-15. A clever student might choose to exploit the "drag a point" feature of SPSS to help construct examples for 4-12 and 4-13. Topic 5 Measures of SpreadThis topic introduces calculations for the lower quartile (Q1), upper quartile (Q2), interquartile range (IQR), standard deviation (s), maximum (max), and minimum (min). Each can be computed in SPSS by using Analyze > Descriptive statistics > Explore, making sure that Percentiles is selected in the Statistics dialog box. Be sure to point out that the Tukey’s Hinges give the quartiles that students computed by hand. Activity 5-1 includes the first look at a boxplot in SPSS. Be aware that SPSS always draws a "modified" boxplot (with “mild” outliers shown as individual dots and “extreme” outliers shown as individual asterisks), although students won't encounter this idea formally until Activity 6-2. Activity 5-4 compares the effects of an outlier on the standard deviation, interquartile range, and range. Homework Activities 5-8, 5-10, 5-13, 5-16, 5-22, 5-24, and 5-26 all call for the use of SPSS, particularly whenever a standard deviation is to be calculated. Topic 6 Comparing Distributions I: Quantitative VariablesActivity 6-2 introduces the procedure for constructing a modified boxplot (by hand) - checking for points that lie more than 1.5 IQR beyond the quartiles. This is similar to the procedure used for any boxplot produced in SPSS. SPSS modifies the process further by taking points more that 3 IQR beyond the quartiles and making them “extreme” outliers. Although the activity only asks students to produce plots by hand, you might suggest that they check their results by producing the same boxplots in SPSS after doing Activity 6-4, since the data in GOLFERS99.SAV are already in the form required for that procedure. The only way to get a boxplot of a single sample is in Analyze > Descriptive statistics > Explore. Activity 6-4 contains the only new SPSS instructions in this topic. This is a good point for a hint about setting up the data. Comparisons will work much more efficiently in SPSS (and many other statistics programs) if all the data for a variable are in the same column, and a second variable identifies the groups (rather than having one variable for the ages of Oscar-winning films and the ages for nonOscar-winning films in separate columns). Students should access existing SPSS files with data for Homework Activities 6-5, 6-7, 6-10, and 6-18. Using SPSS is optional for 6-8, 6-9, 6-20, and 6-21. Other homework activities either specifically request work to be done by hand or provide graphical displays in the text. Topic 7 Comparing Distributions II: Categorical VariablesThis topic demonstrates several ways to work with categorical data in SPSS. These include a two-way table and a segmented bar chart. Activity 7-5 contains the instructions for using Crosstabs in SPSS to analyze gender and party affiliation in the 1999 U.S. Senate to create a two-way table with marginal totals. Activity 7-5 (c) goes through similar steps with a segmented bar chart. Homework Activitiy 7-15 includes a note about adding column proportions to a two-way table in SPSS to show a conditional distribution numerically, then compares the results to a ribbon plot. You may want to point out the corresponding function to do row proportions to show the conditional distribution in the other direction. Most of the other homework activities simply provide summary data for the two-way table; thus, they should not require SPSS. Possible exceptions are 7-16, 7-17, and 7-18, for which students may already have the raw data in SPSS files. Topic 8 Graphical Displays of AssociationActivity 8-4 contains the instructions for using SPSS to produce a scatterplot. They will always want to create a simple scatterplot. This is an easy process for students, who should now be adept at dragging an attribute and dropping it on an axis. Part (d) of this activity introduces another new SPSS toolthe filter. Students should think of a filter as a mechanism for restricting the analysis to only those cases that satisfy its condition. It is important to note that a filter eliminates cases from analysis. Note also that the cases are not actually deleted from the data, so they may be restored by deleting or modifying the filter. Activity 8-5 contains the other new SPSS techniques in this topicmethods for producing a labeled scatterplot. The first method uses the Scatterplot dialog box that students were introduced to in the previous activity. The second method uses interactive scatterplots. Each of the Homework Activities have associated SPSS documents, although 8-7, 8-8, 8-10, 8-18, and 8-19 do not require the use of SPSS. Topic 9 Correlation CoefficientA correlation coefficient is computed in SPSS using Analyze > Correlate > Bivariate. Activity 9-1 asks students to have SPSS compute a correlation using the dialog box. In 9-1(i) students will again use a filter to select only the public four-year colleges (Type="pub4"). Although the instructions are explicit, you will need to remind some students about the use of filters. Activity 9-5 uses an SPSS script for the first time. Students will need to be told where to find the randcorr script and how to run an SPSS script. Its use is straightforward and clearly explained in the manual. A Java applet that implements this activity is available here. Since we are assuming that students will use SPSS to compute correlations, the Homework Activities will almost all require SPSS. The exceptions are 9-8, 9-15, and 9-16, which look only at the direction of association or examine issues of cause and effect. Topic 10 Least Squares Regression IActivity 10-1 walks students through eyeballing a line that seems to fit the data. They are then introduced to Analyze > Regression > Linear to see how SPSS performs the least squares fita unique introduction to the concept of a least squares line. Activity 10-2 introduces the interactive scatterplot feature that draws the least squares line on the scatterplot. Activity 10-3 uses the ability to save residuals and the calculator to produce squared residuals and squared deviations from the mean. Students are then shown how to use Analyze > Descriptive statistics > Descriptives to compute the sums of squares necessary to obtain r2. Activity 10-4 uses the Split file feature to compute regressions using year founded to predict tuition for the different categories of colleges. All of the Homework Activities for this topic (except 10-5, 10-12, and 10-17) assume that students will use SPSS and existing data sets to answer questions about least squares lines. Topic 11 Least Squares Regression IIActivity 11-1 introduces the ability in SPSS to save the predicted values (fits) from a regression equation. Activity 11-1(f) directs students to use Data > Select cases to temporarily delete the giraffe from the data. Activity 11-1(i) directs students to return the giraffe to the data and to temporarily remove the elephant from the data. You might mention that the potential to influence a least squares line is often referred to as the "leverage" of a data point, since the dynamic interplay of a single extreme point and the least squares line makes the analogy to a teeter-totter so compelling. In Activity 11-3 students see how to easily transform either the predictor or the response attribute to try to improve the linearity of the fit. Although some software (and calculators) allow students to pick from a menu of automated transformed regression models, students need to explicitly create the transformed variables in SPSS. This makes the process slightly more cumbersome but helps ensure that the students understand what they are doing to the datarather than rely on a "black box" to spit out a model. All of the Homework Activities for this topic, except 11-15, require the use of SPSS. Topic 12 SamplingThe new SPSS idea in this topic is the use of SPSS scripts to draw random samples. We strongly recommend doing Activities 12-3 through 12-6 in sequence because they lead students through increasingly automated procedures for selecting a sample of senators from the 1999 U.S. Senate. Having some experience selecting their own samples by hand will help students when the computer takes over in the later activities. Activity 12-5 describes the process for generating a random sample using an SPSS script. Activity 12-6 illustrates that the size of the population does not influence the results of sampling. Homework Activities 12-9 and 12-18 are the only homework activities that require SPSS. In 12-18, students are asked to repeat what they did in Activities 12-5 and 12-6 for sample sizes of 5 and 20 senators. Topic 13 Designing StudiesNo new uses of SPSS appear in this topic. Topic 14 ProbabilityThe new use of SPSS in this topic is as a simulation tool using SPSS scripts. Although students first do simulations "by hand," they still may have difficulty making the connections to the computer simulations. You may want to work through some of these as a class or be ready to help make the connection after students have done an SPSS simulation. Activity 14-4 has students use the SPSS script binsample to simulate the number of girls in families with four and ten children. Most of the Homework Activities do not require SPSSwith a couple of notable exceptions. Homework Activity 14-12 uses random binomial values and is very similar to in-class Activity 14-4. Homework Activity 14-13 uses the binsample script to simulate individual coin flips. Topic 15 Normal DistributionsActivity 15-1(n) uses the CDF.NORM and CDF.NORMAL functions in the calculator to compute normal probabilities. Activity 15-2 uses the functions introduced in Activity 15-1. In addition, 15-2(j) introduces students to the IDF.NORMAL function for finding a normal endpoint with a given cumulative probability. None of the Homework Activities specifically ask for students to use SPSS, although students could do so for any that require normal calculations. You may want to request that a particular technology (SPSS, normal table, or calculator) be used for certain problems if you want to be sure that students get some practice with that technology. Topic 16 Sampling Distributions I: ProportionsActivity 16-3 uses SPSS to simulate proportions from samples of Reese's Pieces. Note the importance of first doing the simulation "by hand" with real candies in Activity 16-2 before moving to technology to simulate the same process many more times and much more efficiently. The script reeses computes phat for an entire sampleso each case represents a new sample. Note that 16-3(f) uses the variables one, two, and three that are created by the script to count the number of phat values in the desired intervals. After working with samples of size 25, parts 16-3(i)-(m) look at a larger sample size (75). Homework Activities 16-9, 16-11, 16-12, and 16-15 give students more practice using SPSS to simulate sample proportions. To run the BINSAMPLE.SBS script in Homework Activity 16-9, students should open an empty data view and create an empty variable named x. Topic 17 Sampling Distributions II: MeansActivity 17-2 has students use the SPSS script sample to simulate sample means drawn from a population of 1000 pennies. The latter parts of Activity 17-2 have students explore the effects of changing sample size and the nature of the underlying distribution. Parts 17-2(j)-(m) investigate the effect of sampling from distributions with different shapes to see the central limit effect. Homework Activity 17-6 is the only one that asks students to use SPSS to simulate a sampling distribution by methods similar to Activity 17-2. Students may also use SPSS for constructing visual displays in Homework Activity 17-5. Topic 18 Central Limit TheoremThis topic contains no new uses of SPSS, although students may use SPSS (as in Topic 15) for doing normal distribution calculations. Topic 19 Confidence Intervals I: ProportionsFor the examples from the text, the SPSS script ciprop will use the normal approximation to compute a confidence interval for a proportion. Activity 19-4 introduces students to using SPSS to calculate a confidence interval for a proportion. In contrast to other SPSS procedures students have encountered, this may be done without any raw data. Most of SPSS's inference procedures cannot work with summary statistics entered directly. If students have the raw data, they can get a confidence interval by running Analyze > Descriptive statistics > Frequencies to obtain the sample size and the number of successes and then run the script. Activity 19-5 uses the SPSS script confsim to simulate many confidence intervals. It then determines how many of the simulated intervals contain the population proportion. It also creates a graph of some of the simulated intervals. The goal of this simulation is to have students generate enough 95% confidence intervals (200) to see that about 5% will fail to cover the "true" proportion of 45%. For 19-5(c) you might suggest that students sort the cases according to the values in the variable inthere to facilitate finding those intervals that do not contain the population proportion. The Homework Activities provide a number of opportunities for students to practice constructing and interpreting confidence intervals for proportions. Most contain summary data with sample size and count (or sample proportion) where students could use either the formula on page 420 or SPSS to calculate the interval. Particularly in Homework Activities such as 19-14 and 19-15, which systematically vary a parameter such as the sample size or confidence level, we recommend using SPSS to perform the calculations. Topic 20 Confidence Intervals II: MeansActivity 20-1 contains the introduction to the t-distribution. Activity 20-5 asks students to use SPSS to calculate a confidence interval for a mean. This is done using raw data and Analyze > Descriptive statistics > Explore. We do provide a script cimean that constructs a confidence interval for the mean using summary statistics, but it is not needed in this topic. The process with summary statistics is very similar to what students encountered in Activity 19-4 for a proportion confidence interval. Students may want to use this script in Homework Activity 20-10. Although SPSS can be used to find critical values of the t-distribution, the text never asks students to do so. If you would like to add this, we recommend a formula of the form IDF.T(0.975,df), where the first argument is the probability below the desired t* value. This might be done in conjunction with Homework Activity 20-7, where students are asked to find lots of t* values from the t-table. Homework Activities 20-8, 20-9, 20-12, 20-15, 20-16, and 20-18 may use SPSS to produce graphics and confidence intervals. 20-18(d) uses the script pennysim to perform the requested simulation. Topic 21 Tests of Significance I: ProportionsActivity 21-2 has students use the SPSS script testprop to verify their hand calculations for a test for proportion. As they first did with a proportion confidence interval in Activity 19-4, students should enter the summary statistics and the null hypothesis proportion and select the appropriate alternative hypothesis in the dialog box. Activity 21-3 can use SPSS only in part (f), although you might choose to have students do that test by hand. Activity 21-4 uses SPSS to quickly check tests for different sample sizes where the sample proportion stays fixed. The Homework Activities all involve proportion tests from summary data and can be done either by hand or with SPSS. Topic 22 Tests of Significance II: MeansActivity 22-1 uses SPSS briefly to compute summary statistics and to obtain a visual display of the data (NBAPOINTS.SAV) to be tested in the next activity. Activity 22-3 starts by asking students about their intuition of how significance depends on sample size and variability of some hypothetical data and then introduces Analyze > Compare means > One-sample t-test to use SPSS to perform several tests to quickly check their conjectures. Activity 22-4 has students use SPSS to perform a paired difference in means test. SPSS does have a procedure for this, but since the activity asks students to compute the differences and plot them, you may want to have them use the regular t-test and then show them the paired comparison procedure. Point out that the paired comparison procedure does not let them control how the differences are calculated. Homework Activities 22-7, 22-10, 22-11, 22-12, 22-14, and 22-16 require SPSS. We provide a script testmean that uses summary statistics. Consider whether you want students to use it. Homework Activity 22-16 also uses the script babysim to conduct the requested simulation. Topic 23 More Inference ConsiderationsThis topic contains no new uses of SPSS, although most of the activities include SPSS components that students have seen in previous activities. Activities 23-1, 23-2, and 23-3 have students compute confidence intervals and tests for a proportion from summary data with SPSS. Activity 23-4 introduces students to the concept of power through a simulation with SPSS using the script binsample. Activity 23-6 uses SPSS to provide summary statistics, compute confidence intervals for means, and then produce visual displays to demonstrate that data with the same summary characteristics might have drastically different appearances. Almost all of the Homework Activities involve constructing confidence intervals and/or performing tests for proportions or means. You may decide how to balance these between hand calculations and SPSSbut we recommend using SPSS in Homework Activities such as 23-15, 23-20, 23-21, and 23-23, which require multiple inferences with varying parameters. Homework Activities 23-25 and 23-26 require SPSS to work out the details of simulations (similar to Activity 23-4) to verify the effects of moving the "true" proportion or changing the significance level on the power of a test. Topic 24 Comparing Two ProportionsNote that SPSS does not have two sample inference procedures to compare proportions. We provide two scripts to cover this topic, ci2prop for confidence intervals and test2prop for tests. The scripts use summary statistics. Consider introducing test2prop in Activity 24-1 and ci2prop in Activity 24-2. Activity 24-3 formally introduces the two sample tests for proportions with the SPSS script test2prop. Its use is straightforward. All of the Homework Activities involve doing either a confidence interval or the test for the difference in two proportions (or both). Homework Activity 24-6 is the first place students will encounter using SPSS to do a confidence interval for the difference in two proportions. As with earlier topics, you may determine whether you want students to use SPSS to help with the calculations. Data for summary counts are given in each problem. Homework Activities 24-7, 24-17, 24-19, 24-22, and 24-23 require calculating several tests or intervals that would be appropriate to do with SPSS. Topic 25 Comparing Two MeansTwo issues need to be addressed for doing a two sample t-test or confidence interval with statistical software. Should we use unpooled or pooled variances when computing the standard error? The text uses only the unpooled option, so students may be confused by the fact that SPSS does both the pooled and unpooled versions each time. What about the degrees of freedom? The text uses the conservative approach (smaller of the two degrees of freedom for the separate samples), while SPSS uses a more complicated formula (known as Satterwaite's approximation) for approximating the degrees of freedom. Be ready to answer student queries about the degrees of freedom displayed in SPSS. Activity 25-2 introduces the use of SPSS to do both t-tests and confidence intervals to compare two sample means. Activity 25-3 asks students to conduct two tests from summary statistics rather than raw data. Although we intend for these to be done by hand, if time is short, students could use the script test2mean. Many of the Homework Activities (for example 25-11 through 25-21) contain raw data in SPSS files. Others have summary data that can be analyzed by hand or with the SPSS scripts ci2mean and test2mean. Topic 26 Inference for Two-Way TablesActivity 26-3 introduces the SPSS script chisquare for performing a chi-square test for a two-way table when the table has been entered in a new data set. For 26-3(c), the SPSS script doesn't show the contribution of each cell to the chi-square statistic, so students will need to look at both the size of the discrepancy between the observed and expected counts and the magnitude of the expected count to find the cells making the largest contributions. Activity 26-5 has students do a simulation of the distribution of the chi-square statistic by randomizing one of the categorical attributes (attitude toward spending on the space program) while fixing a second attribute (political viewpoint). This is done automatically using the script chisim. Most of the Homework Activities contain two-way tables that can be analyzed by hand or with the chisquare script. Homework Activity 26-8 is specifically designed with partial calculations given to facilitate hand calculations. Homework Activities 26-11 and 26-13 involve generating new tables from existing tables and are best done with SPSS. Topic 27 Inference for Correlation and RegressionActivity 27-1 introduces the t-test for a significant correlation by hand and with SPSS. Activity 27-2 might be viewed as your SPSS final exam. Start with some basics by producing a scatterplot [27-2(b)], then find least squares lines and correlations [27-2(c) and 27-2(f)]. Another simulation [27-2(g)] starts with choosing samples from a large bivariate population to see how the sample slopes behave for repeated samples. This uses the script gparegcoef. After students are guided through a hand calculation of the t-test for slope, they see the SPSS test [27-2(o)] and then the confidence interval for slope [27-2(s)]. Finally, the script gparegcoef produces a neat plot showing each of their sample regression lines, giving a visual image of how sample regression lines can vary. All of the Homework Activities require SPSS to analyze existing data files. Homework Activity 27-9 includes a simulation to see how the correlation varies when one of the attributes in a pair is repeatedly scrambled. This activity uses the script shufflecars. |
![]() |
|||||||||||||||||||||||||||||||||||