There are several ways that you can analyze a temporary or a permanent subset of your data. The examples below will illustrate some of these methods.

## Example 1: Creating a random sample

There may be times when you would like to
analyze only a subset of your
data. For example, suppose that you have a huge data file with thousands of cases, and that you written a syntax file to analyze the
data. Because the syntax may take hours to run, you may want to take a relatively small sample of your data and run the syntax on that to see if
it works properly. There are several ways that you could create a sub-sample, such as using the only the first 100 cases. However, in this situation, it may be best to take a random
sample of your data. The SPSS command to do this is** sample**. For this example, we will randomly select 20% of the data, and we
will use the **means** command to show the effect of taking the subset.

Let’s consider the following data set. It has two independent
variables (**iv1** and **iv2**) and two dependent variables (**dv1
**and
**dv2**).

data list list / sub iv1 iv2 dv1 dv2. begin data 1 1 1 . 25 2 1 1 49 37 3 1 1 50 55 4 2 1 . 19 5 2 1 20 38 6 2 0 23 48 7 2 0 28 44 8 3 0 28 68 9 3 0 . 30 10 3 0 32 36 end data.

Note that in the SPSS output there are a series of warnings messages. The data was read in correctly. However, SPSS is letting the user know that it took the missing values designated by a "." (in the raw data file above) and has read them in as system-defined missing.

save outfile 'c:sset.sav'.means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 10 100.0% 0 .0% 10 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 37.2500 4 12.84199 3.00 44.6667 3 20.42874 Total 40.0000 10 14.46836 sample .20. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 4 100.0% 0 .0% 4 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 25.0000 1 . 2.00 38.0000 1 . 3.00 33.0000 2 4.24264 Total 32.2500 4 5.90903

Be aware that **sample** takes a
permanent sample of the data in the working file. In other words, the cases that are not selected
are deleted. The next example will illustrate how to take a subset without deleting the non-selected cases.

## Example 2: Creating a temporary random sample

The **temporary** command can be used with most SPSS commands,
and we will use it here to create a temporary subset of the data in the working file. The
**temporary** command allows you to create or transform variables and is in effect only until the next procedure is
executed. In the example below, the **means** command is the procedure that will terminate the
**temporary** command. To illustrate this, we will issue the **means** command twice. The first time, the
**temporary** command will be in effect, and the descriptive statistics will reflect the reduced
number of cases. It will also terminate the** temporary** command, so that
the second **means** command will be run on the full data set.

get file 'c:sset.sav'. temporary. sample .20. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 4 100.0% 0 .0% 4 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 19.0000 1 . Total 34.0000 4 15.87451

means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 10 100.0% 0 .0% 10 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 37.2500 4 12.84199 3.00 44.6667 3 20.42874 Total 40.0000 10 14.46836

## Example 3: Selecting a specific number of cases

The **sample** command can also be used to select a specific number of
cases. For example, suppose that you wanted to obtain descriptive statistics on
four cases randomly drawn from the first eight cases in the data set.

temporary. sample 4 from 8. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 4 100.0% 0 .0% 4 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 25.0000 1 . 2.00 43.3333 3 5.03322 Total 38.7500 4 10.04573

If you wanted the four cases to be drawn from the entire data set, you
would simply put the total number of cases after the keyword **from**.

temporary. sample 4 from 10. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 4 100.0% 0 .0% 4 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 37.0000 1 . 2.00 38.0000 1 . 3.00 33.0000 2 4.24264 Total 35.2500 4 3.59398

## Example 4: Selecting a specific number of the first cases

Suppose that you read a large data file into SPSS and you just wanted
to see if the data were read in properly. Because the file is large, running descriptive statistics on the entire data set would be time
consuming, and would probably not be any more informative than running the descriptive statistics on a small sub-set. You could use the** n of cases** command to select, say, the first 50 cases in the data file. As with
the** sample** command, this ** n of cases** command
permanently modifies your data set. If you do not want the rest of the cases to be deleted, you will need to
use the **temporary** command just before the** n of cases** command. Also
note that the **n of cases** command can be shortened to **n**.

temporary. n of cases 5. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 5 100.0% 0 .0% 5 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 28.5000 2 13.43503 Total 34.8000 5 13.86362

Equivalently,

temporary. n 5. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 5 100.0% 0 .0% 5 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 28.5000 2 13.43503 Total 34.8000 5 13.86362

## Example 5: Selecting cases based on value of one or more variables

Sometimes you may want to select cases based on the value of one
or more variables. For example, suppose that you wanted to obtain descriptive statistics for only the those cases
where** iv3** was greater than two and **dv2** was less than 40. The** select if
**command will permanently select that subset of cases from your data set. As with the other commands, you can use the
**temporary** command to temporarily select the desired cases.

temporary. select if (iv1 gt 1 and dv2 lt 40). means dv2 by iv2.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV2 4 100.0% 0 .0% 4 100.0%

Report

DV2IV2 Mean N Std. Deviation .00 33.0000 2 4.24264 1.00 28.5000 2 13.43503 Total 30.7500 4 8.53913

You can also use the **select if **command to select cases that have
a missing value for the variable of interest. For example, suppose that you wanted to select and analyze the cases for which
**dv1** had a missing value.

temporary. select if (sysmis(dv1)). means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 3 100.0% 0 .0% 3 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 25.0000 1 . 2.00 19.0000 1 . 3.00 30.0000 1 . Total 24.6667 3 5.50757

## Example 6: Filtering by a variable

Another way to subset your data is to filter them by a variable. The
variable that is to be used as a filter must be a numeric variable that is coded zero/one (i.e., a dummy variable). The cases coded
as zero will be filtered. If the filter variable is dichotomous, but coded say, one/two, SPSS will execute the command requested
without a filter, and it will not issue either an error message or a warning message. You can tell if the filter is on by looking in the
lower right-hand corner of the data editor for the "filter on" message. You can see which cases are being filtered by looking at the left-most
column of the data editor. Cases with a slash through the number are being filtered. The
**filter** command does not make permanent changes to your data set, and you can turn it off by issuing the
**filter off **command. Let's suppose that you wanted to use** iv2** as a filter.

filter by iv2. means dv2 by iv1.

Case Processing SummaryCases Included Excluded Total N Percent N Percent N Percent DV2 * IV1 5 100.0% 0 .0% 5 100.0%

Report

DV2IV1 Mean N Std. Deviation 1.00 39.0000 3 15.09967 2.00 28.5000 2 13.43503 Total 34.8000 5 13.86362 filter off.

## Example 7: Subsetting to match percentage in sample to percentage in population

Suppose that you conducted a survey of 10000 people and 70% of
your respondents were female. You know that females make up only about 52% of the population, so you would like to take a subset of your
female respondents such that the proportion of females to males in your data is more similar to that found in the population. First, you need to
calculate how many female respondents you want to keep in your data set. Next,
you would put the **sample** command in a **do if** loop to create the
subset. Finally, you would save the file with a new name, so that your
original data would be preserved.

do if gender = 'female'. sample 3250 from 7000. end if. save outfile 'c:subset.sav'.