1.0 Stata commands in this unit
pwd | Show current directory (pwd=print working directory) |
dir or ls | Show files in current directory |
cd | Change directory |
keep if | Keep observations if condition is met |
keep | Keep variables or observations |
drop | Drop variables or observations |
append | Append a data file to current file |
sort | Sort observations |
merge | Merge a data file with current file |
2.0 Demonstration and explanation
Example 2.1 – Subsetting data
Suppose we are undergraduates working on our honors thesis and we wish to analyze just a subset of the hs1 data file. In fact, we are studying "good readers" and just want to focus on the students who had a reading score of 60 and higher. The following shows how we can take the hs1 data file and make a separate folder called honors and store a copy of our data which just has the students with reading scores of 60 or higher.
use hs1, clear pwd dir ls cd Stata_data keep if read >= 60 describe summarize read save hsgoodread, replace pwd
Example 2.1, continued – Keeping variables
Further suppose that our data file had many, many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write. We can subset our data file to keep just those variables as shown below.
keep id female read write save hskept, replace describe list in 1/20
Example 2.1, continued – Dropping variables
Instead of wanting to keep just a handful of variables, it is possible that we might want to get rid of just a handful of variables in our data file. Below we show how we could get rid of the variables ses and prog.
use hsgoodread, clear drop ses prog save hsdropped, replace describe list in 1/10
Example 2.2 – Appending data
Now we have moved on to our master’s thesis. We have a folder called masters and we have been given a file with the data for the males (called hsmale) and a file for the females (called hsfemale). We need to combine these files together to be able to analyze them, as shown below. In this example, we are adding cases, sometimes called "stacking" datasets.
dir use hsmale tabulate female append using hsfemale tabulate female save hsmasters, replace
Example 2.3 – Merging data
Now we are working on our dissertation and, as with our masters, we have been given two files. In this case, we have a file that has the demographic information (called hsdemo) and a file with the test scores (called hstest) and we wish to merge these files together. First, we need to open, sort and save each data file. Each data file must be sorted by the same variable. Next, we use the merge command to merge the two datasets.
dir use hsdem, clear list sort id save hsdem, replace use hstest, clear list sort id save, replace use hsdem merge id using hstest list tab _merge save hsdiss cd .. dir
3.0 For more information