The U.S. Census Bureau provides results of the decennial census in many forms. Most of these are summarized results produced by the Census Bureau in a predefined tabular format. The Public Use Microdata Sample (PUMS) data is different. It gives researchers access to the actual responses collected by the Census Bureau after confidentiality has been preserved. This allows the researcher the freedom to analyze the data in the way most appropriate for their research, without the restriction of predefined tables. This freedom of analysis makes the PUMS data one of the most popular forms of census data and provides rich research opportunities. This article will present a general introduction to the 1990 PUMS data structure and its contents.
The 1990 PUMS data is a selected sample of raw data within specific geographic areas extracted from the actual Census Long-Form Questionnaires. The 1990 PUMS data reflects the U.S. housing and population status on April 1, 1990. The Census Bureau protects the privacy of the individual respondents by editing and removing all identifying information before releasing the PUMS data to researchers.
The 1990 PUMS data contains records for households, with information on the characteristics of each housing unit and the people living in them. In effect, the researcher can do customized tabulations and statistical analyses of census data to meet the needs of specific research projects, while reaping the benefits of Census Bureau data collection techniques and very large sample sizes that might not be feasible otherwise.
For 1990, the U.S. Census Bureau provides two independently drawn samples of the Long-Form Questionnaires. These samples are 5% and 1% PUMS data. The primary difference between these two samples is the geographic area associated with the sample. In general, the PUMS 5% data is based on counties, except in states where counties are not defined and then county-equivalent geographic areas are used. For example, in the State of Louisiana, parishes are the county equivalent. The PUMS 1% data is based on metropolitan areas and is a smaller sample. The remainder of this article will deal specifically with the 1990 PUMS 5% Data.
Distribution of Census Questionnaires
The U.S. Census Bureau distributed the census questionnaires to every housing unit in the United States. Both Long- and Short-Form Questionnaires were mailed or hand delivered at varying rates, depending on the population and the density of housing units. Taking into account the varying rates, approximately 15.9% of U.S. housing units received the Census Long-Form Questionnaire and the rest of the housing units received the Short-Form Questionnaires.
The U.S. Census Bureau varied the distribution rate of the Long-Form Questionnaire using three rates (1 in 6, 1 in 2, and 1 in 8) depending, on the population. The reasons that the Census Bureau chose to vary the rates were "to provide relatively more reliable estimates for small populations" and "to decrease the respondent burden in more densely populated areas." (Census of Population and Housing, 1990: Public Use Microdata Samples Technical Documentation.) According to the Census Bureau documentation, a 1 in 6 rate was used unless information gathered in the precensus estimates taken in 1988 or work done in 1989 indicated that one of the other rates were appropriate. The other rates were used under the following conditions:
- If a governmental geographic area, such as a county, had a population of 2,500 persons or less, then 1 in 2 housing units received the long form.
- If a census tract or block-numbering area had 2,000 or more housing units, then 1 in 8 housing units received the long form.
Housing units on American Indian reservations, Tribal Jurisdiction Statistical Areas, Alaska Native villages, and Trust Lands were sampled with the Long-Form Questionnaires according to the same criteria as other governmental areas. The sampling rates, however, were based on the size of the American Indian and Alaska Native populations. In Hawaii, the same sampling rates were used for "census-designated places," because the Census Bureau does not recognize Hawaii’s incorporated places. (Census of Population and Housing, 1990: Public Use Microdata Samples Technical Documentation.)
The PUMS data, like all other data released by the Census Bureau in print or on electronic media, is subject to strict confidentiality measures. These measures are imposed by law under Title 13 of the United States Code, which protects the confidentiality of individual respondents. Under these laws, questionnaire responses can be used only for statistical purposes, and Census Bureau employees are sworn to protect respondents’ identities.
PUMS records are selected as a stratified random sample after all of the confidentiality editing has been performed. The Census Bureau edits the long-form responses by:
- Removing names and addresses.
- Recoding responses into a smaller number of categories.
- Topcoding high values into a single high-value category, such as income.
- Replacing actual values with a descriptive statistic, such as substituting the median.
- Specifying only large geographic areas of 100,000 or more inhabitants.
- Scrambling housing units within the geographic area to remove any possible assumption of order.
Since the PUMS data contains only a small fraction of the total population, the chances of a specific individual being included in the data is limited.
Selection of PUMS 5% Data
The PUMS data was selected with a stratified systematic procedure that allowed Long-Form Questionnaire responses from each housing unit equal probability of being included. The strata were defined so that there would be a high degree of homogeneity among the responses from each household with respect to characteristics of major interest.
- The sampling universe for the PUMS 5% data was divided into three categories:
- Occupied housing units, including all occupants.
- Vacant housing units.
- Persons living in group quarters. (Examples of group quarters include hospitals, military bases, prisons, and dormitories.)
Strata were defined for each of the three major categories for a total of 1,049 strata:
- Occupied housing units were stratified into 936 strata by family type, race, Hispanic origin of householder, tenure, and age within each of the three types of governmental geographic areas, with varying long-form rates.
- Vacant housing units were stratified into 104 strata by vacancy status within each of the three types of areas.
- The population in group quarters was stratified into nine strata by group quarter type, race, Hispanic origin, and age.
Structure of the 1990 PUMS 5% Data
The 1990 PUMS 5% data is structured as a series of groups of related records for a specified geographic area. The related records are of two types: one for the housing unit and zero or more for the people living in the housing unit. These records are referred to as the household record and the person record, respectively. The geographic information on the household record applies to both the household and the people living in the housing unit.
The 1990 PUMS 5% Data geographic information gives location of the housing unit only in the broadest terms of geography. The geographic information consists primarily of three variables: a code for the region of the U.S. containing the state (i.e.: New England, Middle Atlantic, Pacific, etc.), a unique state code, and a Public Use Microsample Area (PUMA) code. A PUMA code is a 5-digit code that is unique within the state and is supplied by the State Data Center. Other descriptive variables that delineate the geographic areas are also included in the geographic information.
In general, a PUMA is based on a county and the places within the county. There may be more than one PUMA code in a county if the population is high. For example, if the population exceeds 200,000 persons, then it is possible for the State Data Center to designate more than one PUMA code within that county. However, the U.S. Census Bureau requires the State Data Center to define each PUMA code in such a way that the geographic area specified by the code contains at least 100,000 people. In the California PUMS 5% data, some counties have a large number of PUMA codes due to the very large populations.
The household record contains information about each household. Housing units are of three general kinds: occupied, vacant, or group quarters (i.e.: hospitals, prisons, military bases, dormitories etc.). If the housing unit is occupied, then the household record will have one or more corresponding person records for each person living in the housing unit. If the housing unit is vacant, then only the household record is present and there will be no corresponding person record. If the household record is for group quarters, then there will be exactly one corresponding person record, and the household record will be a "dummy record," containing only geographic information with all other variables coded as "not applicable."
The household record contains geographic information and many variables regarding the housing unit, including:
- Economic status of the household
- Physical condition
- Size and number of units
- Amount and percentages of various costs (i.e.: utilities, rent, mortgage, condo fees, taxes, insurance, etc.)
The household record also contains some general information about the people living in the household, along with weights that, when applied, allow the researcher to use PUMS 5% data to produce estimates of the 100% characteristics of the housing units. For more information on the 100% characteristics, see the 1990 Census of Population and Housing Summary Tape File 1 Technical Documentation and see the article "Understanding the Census STF3A File for California".
A person record focuses on an individual member of the household. The primary purpose of the person record is to give detailed information on each individual in the household. The person record contains variables on:
- Demographic information
- Socioeconomic status
- Educational status
- Military service
- Employment/occupation status
- Commuting habits
- Relationship status within the household
- Disability and mobility information
- Many other variables
The person record contains weights that, when applied, allow the researcher to use PUMS 5% data to produce estimates of the 100% characteristics of the population, the same as the household record.
Originally revised: 11 Oct 96