1.0 SAS statements and procs in this unit
proc format | Creates formats (aka value labels) |
label | Creates labels for variables |
rename | Changes the name of a variable in a data step |
if then | Executes a statement only if the condition is true |
functions | Creating new variables using SAS functions |
2.0 Demonstration and explanation
2.1 Proc Format, labels and renaming variables
Let’s see what the data set looks like. We will continue to use the same data set hs0 from the last unit.
libname in 'c:\sas_data\'; proc contents data = in.hs0; run;
Now let’s format one of the variables, schtyp. To create user-defined value labels, we need to use proc format. Then we can apply those value labels to the appropriate variable. Many procedures in SAS are format-aware and will therefore display output using the created value labels.
proc format; value scl 1 = "public" 2 = "private"; run;
In the example below, we will apply the created formal scl to the variable schtyp using a format statement. Thus, when we run proc freq on the schty the table will created using the applied value label. Note, that when you use a format statement within a SAS procedure, the values lable is temporary.
proc freq data = in.hs0; tables schtyp; format schtyp scl.; run;
We can permanently apply a value label to a variable in a data step using the format statement. Note that this data step also produces a temporary dataset hs0b which is based on the file c:\sas_data\hs0.
data hs0b; set "c:\sas_data\hs0" ; * or use set in.hs0; format schtyp scl.; run;
In this data step, we will label the dataset and add a variable label to the variable schtyp. The label option on the data statement sets the dataset label. The label statement in the data step assigns variable schtyp in the dataset hs0b.
data hs0b(label="High School and Beyond"); set hs0b; label schtyp = "type of school"; rename gender = female; run;
We will now look at the effects of the data step using proc contents.
proc contents data = hs0b; run;
In the data step below we change the name of the variable schtyp to public, and gender to female. Then we use proc contents to see that the changes have been made.
data hs0b; set hs0b (rename=(schtyp=public gender=female)); run; proc contents data=hs0b; run;
2.2 Putting things together in a long data step
Now we will run a longer data step to do a variety of tasks. Comments are used to explain the program. (Note the code below replicates some of the code above.)
proc format; * create value labels for schtyp ; value scl 1 = "public" 2 = "private"; * create value labels for grade ; value abcdf 0 = "F" 1 = "D" 2 = "C" 3 = "B" 4 = "A"; * create value labels for female ; value fm 1 = "female" 0 = "male"; run; * create data file hs1, label it ; data hs1(label="High School and Beyond") ; set in.hs0 (rename=(gender=female)); * label the variable schtyp ; label schtyp = "type of school"; * apply value labels to schtyp; format schtyp scl.; * the if-then statements create a new variable, called prog, which is numeric variable ; if prgtype = "academic" then prog = 1; if prgtype = "general" then prog = 2; if prgtype = "vocational" then prog = 3; * apply value labels to female; format female fm.; * the if statement recodes values of 5 in the variable race to be missing (.) ; if race = 5 then race = .; * create a variable called total that is the sum of read, write, math, and science ; total = read + write + math + science; * the if-then statements recode the variable total into the variable grade ; if (total < 80) then grade = 0; if (80 <= total < 110) then grade = 1; if (110 <= total < 140) then grade = 2; if (140 <= total < 170) then grade = 3; if (total >= 170) then grade = 4; if (total = .) then grade = .; * apply value labels to variable grade; format grade abcdf.; run;
Let’s check to see that everything worked as planned.
proc contents data = hs1; run;proc print data = hs1 (obs = 20); run;proc freq data = hs1; tables schtyp*female; run;
Permanently save the dataset as ‘c:\sas_data\hs1’.
data in.hs1; set hs1; run;
2.3 Creating new variables using procedures
There are also a number of SAS procedures and functions that can be used to modify your data. One such procedure is proc standard, which can be used to standardize your variables (i.e., give the a mean of 0 and a standard deviation of 1). Below we use proc standard to standardize the variables read and write. Note that the variables read and write have been replace by standardized versions of those variables.
proc standard data = hs1 mean=0 std=1 out=hs1b; var read write ; run; proc print data=hs1b (obs=10); run;
2.4 Using SAS functions
In the long data step above, we created the variable total by adding the variables read, write, math and science together (i.e., total = read + write + math + science). We could do something similar using the sum() function, but there is a major difference between the two methods. That difference is in how missing values are handled. When we add items using “+”, a case with missing values on any of the variables listed will have a missing value for the resulting variable. In other words, if a case has valid values of read, write and math, but a missing value for science, total will be equal to missing for that case. However, if we use the sum() function, any missing values will be treated as though they were zero, and the new variable will be equal to missing only if all of the variables listed are missing. Which method is most appropriate depends on the situation and what you are trying to achieve. The code below shows how to use the sum() function in a data step. Together with the sum function, we also show the ordinal function, just for the fun of it.
data hs1b; set hs1; total2 = sum(of read write math science); * similarly, mean, max, min and more; mean1 = mean(of read write math science); run; proc print data=hs1b (obs=20); var read write math science total total2 mean1; run;
3.0 For more information
- The Little SAS Book, Fifth Edition
- Chapter 3
- Just Enough SAS: A Quick-start Guide to SAS for Engineers
- Chapter 2