Title page
Contents
ACKNOWLEDGMENTS 4
1. INTRODUCTION 16
1.1. PIAAC Sample Design 18
1.2. Proficiency Measures in PIAAC 20
1.3. Types of Estimates, Areas of Interest, and Results Website 21
2. BACKGROUND 24
2.1. Approaches Applied to Literacy Data 25
2.2. Review of Major Federal SAE Programs 26
2.3. Review of Recent Developments in SAE 27
2.4. Key Features of PIAAC Indirect Estimation Methodology 28
3. DIRECT ESTIMATION 31
3.1. Direct County Estimates 32
3.2. Survey Regression Estimation (SRE) 34
3.3. Variance Smoothing 37
3.3.1. Variance Estimation Smoothing Model for Proportions 39
3.3.2. Variance Estimation Smoothing Model for Averages 40
4. COVARIATE SELECTION 42
4.1. Initial Identification of County and State Covariates 43
4.2. Initial Set of County and State Covariates Sources 44
4.2.1. Initial Set of Sources for County-Level Covariates 45
4.2.2. Initial Set of Sources for State-Level Covariates 46
4.3. Covariates Selection Process 47
4.3.1. Phase 1-Covariates Reduction 48
4.3.2. Phase 2-Cross Validation 52
5. MODEL DEVELOPMENT 56
5.1. Summary of Simulation Results 57
5.2. Final SAE Models 58
5.2.1. Area-Level Bivariate HB Linear Three-Fold Model for Proportions 58
5.2.2. Modeling Averages-Area-Level Univariate HB Linear Three-Fold Model 61
5.3. Model Fitting 61
5.3.1. Software 62
5.3.2. Model Estimation 63
5.4. Predicted Values 68
5.4.1. Indirect Estimates for Sampled Counties 69
5.4.2. Indirect Estimates for Nonsampled Counties 69
5.4.3. Indirect Estimates for States and Nation 70
5.5. Measures of Precision for the Indirect Estimates 71
5.5.1. Credible Intervals 71
5.5.2. Coefficient of Variation 71
5.5.3. Assessment of Precision Measures 72
5.6. Simultaneous Inference 73
6. MODEL DIAGNOSTICS, SENSITIVITY ASSESSMENT, AND EVALUATION 76
6.1. Internal Model Validation 76
6.1.1. Convergence and Mixing Diagnostics 78
6.1.2. Checks on Model Assumptions 80
6.1.3. Model Sensitivity Checks 86
6.1.4. Changes in the Model Specification 100
6.2. External Model Validation 104
6.2.1. Model Validation Graphs 104
6.2.2. Comparison of Aggregates of Model Predictions and Direct Estimates 110
7. SUMMARY 116
REFERENCES 117
APPENDIX A. LIST OF POTENTIAL COVARIATES 122
APPENDIX B. SIMULATION STUDY RESULTS 136
APPENDIX C. SELECT STUDY RESULTS 170
APPENDIX D. NEGATIVE ESTIMATES 221
Table 1-1. Number of completed cases for PIAAC samples: 2012/2014/2017 19
Table 1-2. Number of counties with at least one completed case: 2012/2014/2017 19
Table 1-3. Number of completed cases per county: 2012/2014/2017 19
Table 3-1. Distribution of the proportion of variance associated with multiple imputation for direct estimates across counties: 2012/2014/2017 33
Table 3-2. Summary of variance estimates prior to SRE, after SRE, and after smoothing: 2012/2014/2017 38
Table 3-3. Parameter estimates for the variance smoothing model for proportions: 2012/2014/2017 40
Table 3-4. Parameter estimates for the variance smoothing process for county-level variances for literacy and numeracy average: 2012/2014/2017 41
Table 4-1. List of phase 1 select covariates, including their label, source, and year 51
Table 4-2. Predictor variables selected in phase 1, by the outcome model and LASSO lambda option 52
Table 4-3. Covariates used in cross validation for literacy proportions and results of summed squared differences between predicted proportions and direct estimates: 2012/2014/2017 54
Table 4-4. List of covariates for the final small area models 55
Table 4-5. Correlation coefficients among covariates for the final small area model: 2012/2014/2017 55
Table 5-1. Initial parameter values of β for literacy and numeracy proportions and averages: 2012/2014/2017 63
Table 5-2. Regression coefficients and components of the variance-covariance matrices of random effects for the final HB models: For literacy and numeracy proportions: 2012/2014/2017 66
Table 5-3. Regression coefficients and variances of random effects for the final HB models: For literacy and numeracy averages: 2012/2014/2017 68
Table 5-4. National-level indirect and direct estimates: 2012/2014/2017 70
Table 5-5. Distribution of credible interval widths and coefficients of variation for indirect estimates for literacy proportion at or below Level 1: 2012/2014/2017 72
Table 6-1. Convergence diagnostics for the MCMC: 2012/2014/2017 80
Table 6-2. Variance inflation factors: 2012/2014/2017 81
Table 6-3. Posterior predictive checks for bivariate HB model: Summaries of posterior predictive statistics for literacy proportions at or below Level 1: 2012/2014/2017 86
Table 6-4. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices (LKJ priors for the Cholesky factors of the decomposed matrices):... 87
Table 6-5. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices (IW priors for the matrices): 2012/2014/2017 91
Table 6-6. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices and fewer initial values: 2012/2014/2017 94
Table 6-7. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices and fewer initial values and noninformative choice of hyperparameters:... 97
Table 6-8. Convergence diagnostics for the MCMC, using a smaller number of MC samples and default software parameters for the sampling algorithms: 2012/2014/2017 103
Table 6-9. Comparison of aggregated county-level indirect and direct estimates for Literacy P1, by subgroup: 2012/2014/2017 112
Figure 1-1. Number of completed cases by states with at least one completed case, sorted by number of completed cases: 2012/2014/2017 20
Figure 4-1. Covariate selection process diagram 43
Figure 5-1. Illustration of county-to-county comparison: 2012/2014/2017 74
Figure 6-1. Residual plots for the first set of residuals: 2012/2014/2017 82
Figure 6-2. Residual plots for the second set of residuals (conditional on the random effects components): 2012/2014/2017 83
Figure 6-3. Posterior means and standard deviations for the regression coefficients under HB models with LKJ prior on the correlation matrix versus LKJ prior on the Cholesky factor... 88
Figure 6-4. Posterior means and standard deviations for county-level literacy proportions under HB models with LKJ prior on the correlation matrix versus LKJ prior on the Cholesky factor... 89
Figure 6-5. Posterior means and standard deviations for the regression coefficients under HB models with LKJ prior on the correlation matrix versus IW prior on the variance matrix:... 92
Figure 6-6. Posterior means and standard deviations for county-level literacy proportions under HB models with LKJ prior on the correlation matrix versus IW prior on the variance matrix:... 93
Figure 6-7. Posterior means and standard deviations for the regression coefficients under HB models with IW prior on the variance matrix and different sets of initial values: 2012/2014/2017 95
Figure 6-8. Posterior means and standard deviations for county-level literacy proportions under HB models with IW prior on the variance matrix and different sets of initial values:... 96
Figure 6-9. Posterior means and standard deviations for the regression coefficients under HB models with IW prior on the variance matrix and different sets of hyperparameters:... 98
Figure 6-10. Posterior means and standard deviations for county-level literacy proportions under HB models with IW prior on the variance matrix and different sets of hyperparameters:... 99
Figure 6-11. Posterior means and standard deviations for regression coefficients under univariate and bivariate HB models: 2012/2014/2017 101
Figure 6-12. Posterior means and standard deviations for county-level literacy proportions under univariate and bivariate HB models: 2012/2014/2017 102
Figure 6-13. Literacy proportion-Histograms of differences between survey regression estimates and indirect estimates: 2012/2014/2017 106
Figure 6-14. Literacy proportion-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 107
Figure 6-15. Literacy proportion-Indication of coverage by credible interval: 2012/2014/2017 108
Figure 6-16. Literacy proportion-Comparison between survey regression estimates and indirect estimates: 2012/2014/2017 109
Figure 6-17. Literacy proportion-Comparison between model standard errors and smoothed standard errors: 2012/2014/2017 110
Table A-1. List of county-level variables, by source and year 123
Table A-2. List of state-level variables, by source and year 128
Table A-3. PIAAC county- and state-level variable correlations with literacy/numeracy proficiency outcomes: 2012/2014/2017 130
Table A-4. PIAAC county- and state-level variable LASSO selection results with literacy/numeracy proficiency outcomes: 2012/2014/2017 135
Table B-1. Ratio of the estimated true sampling variance of the PSU-level survey regression estimate to the average direct estimate for the simulation PSUs 145
Table B-2. Features of the models studied, comparing unmatched, "x" (vs. matched, "."), use of the estimated true variance, "x" (vs. smoothed "."), STAN, "x" (vs. JAGS, "."), use of the... 147
Table B-3. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, averaged over all PSUs and for groups of PSUs,... 149
Table B-4. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, for groups of PSUs, classified by the percentage... 150
Table B-5. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, for groups of PSUs, classified by the modeled... 151
Table B-6. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, state-level estimates 153
Table B-7. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 155
Table B-8. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 156
Table B-9. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 157
Table B-10. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 158
Table C-1. Different sets of covariates used in cross validation for literacy average: 2012/2014/2017 180
Table C-2. Different sets of covariates used in cross validation for numeracy proportions: 2012/2014/2017 181
Table C-3. Different sets of covariates used in cross validation for numeracy average: 2012/2014/2017 181
Table C-4. Distribution of credible interval widths and coefficients of variation for small area estimates: 2012/2014/2017 182
Table C-5. Evaluation of aggregate estimates for literacy proportion at Level 2: 2012/2014/2017 200
Table C-6. Evaluation of aggregate estimates for literacy proportion at or above Level 3: 2012/2014/2017 203
Table C-7. Evaluation of aggregate estimates for literacy average: 2012/2014/2017 206
Table C-8. Evaluation of aggregate estimates for numeracy proportion at or below Level 1: 2012/2014/2017 209
Table C-9. Evaluation of aggregate estimates for numeracy proportion at Level 2: 2012/2014/2017 212
Table C-10. Evaluation of aggregate estimates for numeracy proportion at or above Level 3: 2012/2014/2017 215
Table C-11. Evaluation of aggregate estimates for numeracy average: 2012/2014/2017 218
Table D-1. Counties with negative indirect estimates for literacy proportion at or above Level 3: 2012/2014/2017 222
Table D-2. Counties with negative indirect estimates for numeracy proportion at or above Level 3: 2012/2014/2017 222
Table D-3. Counties with positive indirect estimate and negative lower bound of credible intervals for literacy proportion at or below Level 1: 2012/2014/2017 223
Table D-4. Counties with positive indirect estimate and negative lower bound of credible intervals for literacy proportion at or above Level 3: 2012/2014/2017 223
Table D-5. Counties with positive indirect estimates and negative lower bound of credible intervals for numeracy proportion at or above Level 3: 2012/2014/2017 223
Figure B-1. Comparison of estimated true variances of the direct and survey regression estimates for the 1,234 PSUs in the simulation 144
Figure B-2. Comparison of low education vs. the predictions under the matched model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 158
Figure B-3. Comparison of low education vs. the predictions under the unmatched model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 159
Figure B-4. Comparison of low education vs. the predictions of a linear unit-level model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 160
Figure B-5. Comparison of average MSE in predicting low education, 100 simulations of M13 with inverse Wishart vs. LKJ priors 161
Figure B-6. Comparison of average posterior variance in predicting low education, 100 simulations of M13 with inverse Wishart vs. LKJ priors 162
Figure B-7. Comparison of actual average MSE to the average posterior variance in predicting low education, 100 simulations of M13 with inverse Wishart prior 163
Figure B-8. Comparison of actual average MSE to the average posterior variance in predicting low education, 100 simulations of M13 with LKJ prior 163
Figure B-9. Comparison of average posterior variance in predicting low education, 20 simulations of M13, LKJ prior with default settings vs. LKJ with the final settings 164
Figure B-10. Comparison of average posterior variance in predicting low education, 20 simulations of M13, LKJ prior with default settings vs. LKJ with the final settings, with a revised... 165
Figure C-1. Small area estimates and credible intervals for states, by outcome: 2012/2014/2017 172
Figure C-2. Numeracy proportion-Histogram of differences between survey regression estimates and indirect estimates: 2012/2014/2017 185
Figure C-3. Numeracy proportion-Comparison between survey regression estimates and indirect estimates: 2012/2014/2017 186
Figure C-4. Numeracy proportion-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 187
Figure C-5. Numeracy proportion-Indication of coverage by credible interval: 2012/2014/2017 188
Figure C-6. Numeracy proportion-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 189
Figure C-7. Literacy average-Histogram of differences between SRE and indirect estimates: 2012/2014/2017 190
Figure C-8. Literacy average-Scatterplot of SRE and indirect estimates, with sample size as bubbles: 2012/2014/2017 191
Figure C-9. Literacy average-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 192
Figure C-10. Literacy average-Indication of coverage by credible interval: 2012/2014/2017 193
Figure C-11. Literacy average-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 194
Figure C-12. Numeracy average-Histogram of differences between SRE and indirect estimates: 2012/2014/2017 195
Figure C-13. Numeracy average-Scatterplot of SRE and indirect estimates, with sample size as bubbles: 2012/2014/2017 196
Figure C-14. Numeracy average-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 197
Figure C-15. Numeracy average-Indication of coverage by credible interval: 2012/2014/2017 198
Figure C-16. Numeracy average-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 199