Modeling and estimating income data in the presence of distinctive zero and heaped responses
Abstract
A major part of research data in the social sciences originates from survey interviews. Besides the issue of non-response, questions concerning the accuracy of self-reported data, e.g. income, are important research objectives. Heaping, i.e. aberrant concentrations of response values at specifc points of the range, is typical for retrospective data when the respondent is either uncertain about the true value or hesitates to report. Heaped data are linked with a loss of information and hence are found to deteriorate effects on the macro- and micro-level. This work provides descriptive evidence for heaping behavior in the income data of the German National Educational Panel Study (NEPS). The data at hand strongly support the assumption that heaping behavior is not stochastic but deterministic. Respective determinants for heaping behavior are the response value itself and common socio-economic characteristics. Because of that, there is a necessity of adequately addressing this issue, e.g. by a modeling strategy which explicitly takes the non-randomness of the heaping behavior into consideration. According to this, a heaping model is introduced enabling to account for different heaping behaviors. The proposed model is a mixture of two components, the latent distribution and the model for the heaping behavior. A zero-in ated log-normal distribution with a piecewise constant heaping mechanism is defined as base model. The generality and flexibility of the established model is outlined by several modifications and extensions, with respect to the latent distribution, the heaping pattern as well as the heaping mechanism. In the application, all models assumed are explored concerning their fit to the NEPS income data. Posterior predictive checks are used to access the overall fit of the models. This work also includes a comparative analysis of different random-walk Metropolis (RWM) algorithms with respect to their estimation accuracy and efficiency. Besides the original RWM algorithm, blocking and adaptive strategies are inquired into. The aim is to find an algorithm that is well-mixing, i.e. it is ensures that all modes are visited while the acceptance rate is still high. The results indicate that blocking can greatly improve mixing and convergence of the RWM algorithm, in contrast to the adaptive schemes considered. The performance of the models is fairly good, however, large differences in estimation exist with respect to runtime and efficiency. These differences are mainly attributable to the model assumed and the selected specification of the
RWM algorithm.