Beginning in Stata 12 you can read Excel (.xls and .xlsx) files directly using the import excel command. The sheet() option allows us top specify from which sheet of the spreadsheet we want to read, and by appending the data together, we can read data from multiple sheets. To illustrate how this is accomplished we have an Excel file named hospital.xls.
The file, hospital.xls, has four Sheets each with the same format. In cell A1 is the hospital name. In cells B2:D5 are the data for each hospital. The variables names are listed in Row 2. Here is a sample of what one of the Sheets looks like:
A B C D 1 santa monica 2 x y z 3 39 68 8 4 19 8 3 5 5 82 6
As you can see, the hospital is Santa Monica and the variable names are x, y and z. Now here is what we want to do with the Sheets in this Excel file. We want to capture the hospital name and add to each row of the data. Then we want to append the data from each of the hospitals together into a single Stata data file. Additionally, we want to create a numeric variable that codes the hospital name. Here is the code to do this which you can paste into your do-file editor.
/* do-file code to read Excel spreadsheets */ /* loop thru each of the Sheets beginning with Sheet4 */ forvalues i=4(-1)1 { /* capture the hospital name from cell A1 */ import excel using hospital.xls, /// sheet(Sheet`i') cellrange(A1:A1) clear local hname = A[1] /* local macro with hospital name */ /* get the data beginning in cell B2 */ import excel using hospital.xls, /// sheet(Sheet`i') cellrange(B2) firstrow clear generate str12 hname = "`hname'" /* create variable with hospital name */ /* append and save data */ if `i'==4 { save hospital, replace } else { append using hospital.dta save hospital, replace } } encode hname, gen(hnum) /* create numeric hospital code */ order hname hnum /* reorder data placing hospital name first */ save hospital, replace /* final save of the data */
The two import excel commands are worth commenting on in detail. The first one looks like this:
import excel using hospital.xls, /// sheet(Sheet`i') cellrange(A1:A1) clear
We use the full file name with the .xls extension following using. The option sheet, of course, indicates which Sheet (1 thru 4) that we want to read. The cellrange option, in this case, indicates that we want to read a single cell (cell A1). The clear option clears out any data already in memory.
Here is the second import excel command:
import excel using hospital.xls, /// sheet(Sheet`i') cellrange(B2) firstrow clear
The sheet and clear options are the same as before. This time the cellrange option has just a single values, B2, which indicates the upper left-hand starting location to begin reading the data. We have added the firstrow option to indicate that the first row contains the variable names. When you run the do-file you end up with a dataset that looks like this:
. list, clean nolabel hname hnum x y z 1. ucla 4 37 70 9 2. ucla 4 15 80 3 3. ucla 4 29 85 7 4. santa monica 3 39 68 8 5. santa monica 3 19 82 3 6. santa monica 3 25 82 6 7. harbor 2 38 67 9 8. harbor 2 20 80 4 9. harbor 2 25 82 7 10. county 1 40 69 10 11. county 1 20 80 6 12. county 1 23 79 6
And that’s how to read data from multiple sheets from Excel into Stata.