By Amy L. Watts
From Education and the Common Good
pp. 35-44, published 2001
The University of Kentucky Survey Research Center conducts semiannual surveys of Kentucky residents. This report utilized data from those conducted in the spring of 1998 and 2000 and in the fall of 2000. Households on all three surveys were selected using random-digit dialing, a procedure giving every residential telephone line in Kentucky an equal probability of being called. All samples include noninstitutionalized Kentuckians 18 years of age or older. Calls for the Spring 1998 survey were made between May 11 and June 10, 1998. For the Spring and Fall 2000 surveys, calls were made from May 18 to June 26, 2000 and October 28 to November 21, 2000, respectively. In each of the three surveys the number of completed interviews were 658 in the spring of 1998, 1,070 in the spring of 2000, and 859 in the fall of 2000, with response rates of 37 percent, 51.2 percent, and 39.7 percent, respectively. At the 95 percent confidence level, the margins of error were 3.8, 3.0, and 3.3 percentage points on the Spring 1998, Spring 2000, and Fall 2000 surveys, respectively.
We use multivariate probit models and a two-part probit and ordinary least squares (OLS) regression model to estimate the relationship between the answers to the various survey questions used and the explanatory variables of education, income, age, gender, race, ethnicity, and urbanity (or rurality) of respondents’ county of residence. Not all explanatory variables were used in every model.
On every survey, including the latest available survey data from the fall of 2000, the University of Kentucky Survey Research Center asks questions regarding reliance on welfare or food stamps as a source of income:
They also ask whether the respondent is registered to vote:
For all three dependent variables (WELFARE, FOOD, REGISTER), a positive answer equals one and a negative response equals zero. The number of sample participants who had used welfare or food stamps was relatively low. Therefore, data from the Spring and Fall 2000 surveys were pooled to increase the sample size and variability of these two dependent variables. The larger sample sizes yielded more reliable probit model results.
The Kentucky Long-Term Policy Research Center has been tracking civil society trends in Kentucky for the past several years with a series of questions on those surveys conducted in the spring of each year, including the most recent spring survey from 2000:
The dependent variable for the first (VOLUNTEER) and third (DONATE) questions equals one if a respondent answers yes to either question, and zero otherwise. The second question is asked only if the respondent has volunteered in the past 12 months. The dependent variable (HOURS) for this question is then the number of hours the person volunteered in a typical month. The maximum response allowed was 40 hours, so as to minimize any distortions caused by extreme values.
The Center also asks a series of questions on those surveys conducted in the fall of each year regarding other aspects of civil society, including community involvement and leadership activities. In the fall of 2000 the Center asked:
The Center also inquires about cultural and entrepreneurial activity on these surveys:
The dependent variables in all five cases (GROUP, LEADER, LEADPROG, CULTURE, ENTREP) are one for affirmative responses, and zero otherwise. The second question is asked only in the case of a positive response to the first question; therefore, the sample size is smaller than the total possible number of respondents to the survey.
In the spring of 1998 and 2000 the Center asked parents of children ages 8 years old and younger how often they read to their children:
The second and third questions were asked only if the parent had children under the age of eight. Since only a small portion of the sample had children meeting this age criterion, data from the two surveys were pooled to increase sample size. There was not enough variation in the answers to the second question to model the effect of the independent explanatory variables on the probability that a parent reads to his or her children. Practically everyone in the sample reads to their small children. Only 18 of the 379 respondents with children under the age of eight answered that they did not read to their children. For the third question, if a parent reads to their children "about every day," the dependent variable (DAILY) equals one, and zero if the parent reads to their children "about once a week" or less. Therefore, the remaining three choices of "about once a week," "about once a month," and "less than once a month" were grouped together in one category.
The University of Kentucky College of Nursing asked Kentuckians about their smoking habits on the survey conducted in the spring of 2000:
The dependent variable (SMOKE) equals one if the person had smoked any cigarettes in the past 30 days, and equals zero otherwise. Generally accepted practices require more information than that provided by this question alone to establish whether a person is a smoker. However, the analysis and the implications of the results did not require the establishment of smoker status. In addition, the portion of the sample responding "yes" is approximately 30 percent, which is the approximate current adult smoking rate for Kentucky.
Excluding volunteerism, multivariate probit models were used to estimate the relationship between each outcome and the predictor variables of education, income (excluding welfare and food stamps), age, gender, race and ethnicity and location of residence.
Education. To estimate the relationship between education and the probability of the various outcomes, a series of dichotomous variables were used, with a high school diploma or equivalent as the reference group or base case. The first dichotomous education variable (LTHS) is a one if a person’s education level is less than a high school graduate, and a zero otherwise. If a person has attended college without graduating or earned a two-year degree, the second education dichotomous variable (SC2YR) is one, and zero otherwise. The variable representing a college education level or higher (BAORMORE) equals one if a person has earned at least a bachelor’s degree, and is zero otherwise.
Income. Income was also entered as a series of dichotomous variables with household incomes of $20,000 and below as the base case. These variables (INCOME1, INCOME2, INCOME3, and MISINC) are equal to one if a person’s household income ranges between $20,000 and $40,000, $40,000 and $70,000, exceeds $70,000, or is missing, respectively. All income explanatory variables are zero otherwise. In many cases, survey respondents are uncomfortable revealing information about household income, and many of these observations are blank as a result. To keep these observations and somehow account for them in the model, a dichotomous variable was constructed, to indicate whether an observation was missing an income response. Therefore, these observations were not lost in the modeling process.
Age. The variable describing age (AGE) is a continuous variable that represents the age of each person in the sample. The age of each person is divided by ten and the squared term divided by 1000 to reduce scale problems resulting from wide ranges in magnitude between the dependent and independent variables. Age was entered as a quadratic in some models to allow the associated probability to vary with age in a nonlinear fashion.
Gender. The explanatory variable indicating respondent’s gender (GENDER) equals one if female and zero if male.
Race and Ethnicity. The variable controlling for race and ethnicity (RACE) equals one if a person is white, non-Hispanic, and zero otherwise. The survey asks respondents to describe their racial or ethnic background within the following available categories: white, African American, Hispanic or "some other race." In the case of the last response, the person is asked to specify.
Location of residence. This is a dichotomous variable indicating whether the county of residence is classified as urban or rural by the Census Bureau. The variable (URBAN) is set to 1 if the county is urban and 0 if rural.
Each of these explanatory variables used to predict the probability of engaging in each activity listed is shown in Table A.1. This table gives the mean values of the dependent variables and each explanatory variable for each dataset from the fall of 1998, spring and fall of 2000 and the pooled datasets of Spring 1998 and 2000 and spring and fall of 2000. When reading this table keep in mind that all variables, excluding age and its squared term, are bivariate—having only the values of 1 or 0. Therefore, the mean value of 0.13 for less than high school (LTHS) in the Fall 2000 sample indicates that approximately 13 percent of the sample had an educational attainment level less than a high school diploma. The rest of the values from this table should be interpreted in a similar manner. To find the real age simply multiply the mean value by ten.
On average a Kentuckian from each of these datasets is a white, non-Hispanic female, living in a nonmetropolitan area, with some college education or a two-year degree, is in her mid-40s, earning between $25,000 and $30,000 annually in household income. This is the "average" or "typical" Kentuckian discussed throughout the report and used to predict the outcomes of all graphs, unless otherwise specified. The pooled dataset used to model frequency of parents reading to their children has a lower average age of approximately 34 due to the nature of the topic analyzed.
In all, 12 models were estimated using these three datasets. In estimating education’s association with welfare and food stamps, all explanatory variables were used, excluding income. The probit models used in these cases are reduced form models that explain the total effect of education on the use of these programs, including its direct relationship and its indirect relationship through income. The quadratic age term was included, since income has been shown to vary nonlinearly with age in similar models and welfare and food stamps are forms of income. The remaining models, excluding volunteerism, are probit models that use all explanatory variables except the quadratic age term. Finally, volunteerism was estimated using a two-part probit and ordinary least squares (OLS) regression model that incorporated all the explanatory variables, excluding the square of age.
The probability that an individual participates in activity j (e.g. registering to vote) is estimated using a probit model. Whether or not an individual engages in one of the activities analyzed in this section is a dichotomous outcome: an individual either participates in the activity or does not. To model this behavior, the probit assumes that an unobserved variable, called Z, determines whether a positive outcome is observed. When Z exceeds a critical value, which we will refer to as Z*, we observe that the individual engages in the activity in question; when Z is less than Z*, we observe that the individual does not engage in the activity. Z is normally distributed with a mean of zero and a standard deviation of one. The probability of activity participation or engagement can be estimated by evaluating the standard normal cumulative distribution function (CDF) for the probit model’s estimate of Z. The higher the value of Z the greater the probability of activity by the person observed. The unobserved variable Z is modeled by:
1) Zj =Xβ + μ μ ~ N(0,1)
X is a set of explanatory variables, including education, age, income (in most cases), and demographic variables, and μ is a random error term. The probability that an individual shows the behavior in question is given by
2) Pr[Yj>0] = Ф(Zj)
where Ф is the standard normal CDF.
The second part of the volunteerism model uses a linear model to predict the number of hours volunteered, conditional upon the fact the person volunteers.(1) This linear model is estimated using ordinary least squares regression methods on only those respondents that volunteer:
3) (Yj ? Zj>Zj*) =Xβ + μ μ ~ N(0,σ2μ)
The maximum likelihood estimate for the two-part model is obtained by combining the estimate for β in equation 1 with the estimate for β and σ2μ in equation 3. The expected number of hours volunteered is then:
4) E[Yj=volunteer] =Pj=volunteer*[(Xβ) + σμ]
where Pj=volunteer = Pr[Yj=volunteer>0] = Ф(Z>j=volunteer, and Ф is the standard normal CDF. The second part of the model produces unbiased consistent estimates of the number of hours volunteered. This formal two-part model gives the expected number of hours volunteered and these hours are then valued at the average wage rate for Kentucky in 2000.
The parameter estimates of each of the twelve models previously described are given in Tables A.2 and A.3. These parameter estimates were used in conjunction with the averages from Table A.1 to produce the predicted probabilities presented throughout the text of this report.
To view a list of all chapters in this book, click here. To read the chapters in sequential order, please follow the arrows below.
Ahead
to Appendix B: Federal and State Income Taxes
For more on the two-part model used to predict volunteer hours refer to Naihua Duan, Willard G. Manning, Jr. Carl N. Morris, and Joseph P. Newhouse, "A Comparison of Alternative Models for the Demand for Medical Care," Journal of Business and Economic Statistics, 1.2 (1983): 115-126. Return to text.