목차

Title page

Contents

ACKNOWLEDGMENTS 4

1. INTRODUCTION 16

1.1. PIAAC Sample Design 18

1.2. Proficiency Measures in PIAAC 20

1.3. Types of Estimates, Areas of Interest, and Results Website 21

2. BACKGROUND 24

2.1. Approaches Applied to Literacy Data 25

2.2. Review of Major Federal SAE Programs 26

2.3. Review of Recent Developments in SAE 27

2.4. Key Features of PIAAC Indirect Estimation Methodology 28

3. DIRECT ESTIMATION 31

3.1. Direct County Estimates 32

3.2. Survey Regression Estimation (SRE) 34

3.3. Variance Smoothing 37

3.3.1. Variance Estimation Smoothing Model for Proportions 39

3.3.2. Variance Estimation Smoothing Model for Averages 40

4. COVARIATE SELECTION 42

4.1. Initial Identification of County and State Covariates 43

4.2. Initial Set of County and State Covariates Sources 44

4.2.1. Initial Set of Sources for County-Level Covariates 45

4.2.2. Initial Set of Sources for State-Level Covariates 46

4.3. Covariates Selection Process 47

4.3.1. Phase 1-Covariates Reduction 48

4.3.2. Phase 2-Cross Validation 52

5. MODEL DEVELOPMENT 56

5.1. Summary of Simulation Results 57

5.2. Final SAE Models 58

5.2.1. Area-Level Bivariate HB Linear Three-Fold Model for Proportions 58

5.2.2. Modeling Averages-Area-Level Univariate HB Linear Three-Fold Model 61

5.3. Model Fitting 61

5.3.1. Software 62

5.3.2. Model Estimation 63

5.4. Predicted Values 68

5.4.1. Indirect Estimates for Sampled Counties 69

5.4.2. Indirect Estimates for Nonsampled Counties 69

5.4.3. Indirect Estimates for States and Nation 70

5.5. Measures of Precision for the Indirect Estimates 71

5.5.1. Credible Intervals 71

5.5.2. Coefficient of Variation 71

5.5.3. Assessment of Precision Measures 72

5.6. Simultaneous Inference 73

6. MODEL DIAGNOSTICS, SENSITIVITY ASSESSMENT, AND EVALUATION 76

6.1. Internal Model Validation 76

6.1.1. Convergence and Mixing Diagnostics 78

6.1.2. Checks on Model Assumptions 80

6.1.3. Model Sensitivity Checks 86

6.1.4. Changes in the Model Specification 100

6.2. External Model Validation 104

6.2.1. Model Validation Graphs 104

6.2.2. Comparison of Aggregates of Model Predictions and Direct Estimates 110

7. SUMMARY 116

REFERENCES 117

APPENDIX A. LIST OF POTENTIAL COVARIATES 122

APPENDIX B. SIMULATION STUDY RESULTS 136

APPENDIX C. SELECT STUDY RESULTS 170

APPENDIX D. NEGATIVE ESTIMATES 221

Table 1-1. Number of completed cases for PIAAC samples: 2012/2014/2017 19

Table 1-2. Number of counties with at least one completed case: 2012/2014/2017 19

Table 1-3. Number of completed cases per county: 2012/2014/2017 19

Table 3-1. Distribution of the proportion of variance associated with multiple imputation for direct estimates across counties: 2012/2014/2017 33

Table 3-2. Summary of variance estimates prior to SRE, after SRE, and after smoothing: 2012/2014/2017 38

Table 3-3. Parameter estimates for the variance smoothing model for proportions: 2012/2014/2017 40

Table 3-4. Parameter estimates for the variance smoothing process for county-level variances for literacy and numeracy average: 2012/2014/2017 41

Table 4-1. List of phase 1 select covariates, including their label, source, and year 51

Table 4-2. Predictor variables selected in phase 1, by the outcome model and LASSO lambda option 52

Table 4-3. Covariates used in cross validation for literacy proportions and results of summed squared differences between predicted proportions and direct estimates: 2012/2014/2017 54

Table 4-4. List of covariates for the final small area models 55

Table 4-5. Correlation coefficients among covariates for the final small area model: 2012/2014/2017 55

Table 5-1. Initial parameter values of β for literacy and numeracy proportions and averages: 2012/2014/2017 63

Table 5-2. Regression coefficients and components of the variance-covariance matrices of random effects for the final HB models: For literacy and numeracy proportions: 2012/2014/2017 66

Table 5-3. Regression coefficients and variances of random effects for the final HB models: For literacy and numeracy averages: 2012/2014/2017 68

Table 5-4. National-level indirect and direct estimates: 2012/2014/2017 70

Table 5-5. Distribution of credible interval widths and coefficients of variation for indirect estimates for literacy proportion at or below Level 1: 2012/2014/2017 72

Table 6-1. Convergence diagnostics for the MCMC: 2012/2014/2017 80

Table 6-2. Variance inflation factors: 2012/2014/2017 81

Table 6-3. Posterior predictive checks for bivariate HB model: Summaries of posterior predictive statistics for literacy proportions at or below Level 1: 2012/2014/2017 86

Table 6-4. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices (LKJ priors for the Cholesky factors of the decomposed matrices):... 87

Table 6-5. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices (IW priors for the matrices): 2012/2014/2017 91

Table 6-6. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices and fewer initial values: 2012/2014/2017 94

Table 6-7. Convergence diagnostics for the MCMC using alternative specification for the variance-covariance matrices and fewer initial values and noninformative choice of hyperparameters:... 97

Table 6-8. Convergence diagnostics for the MCMC, using a smaller number of MC samples and default software parameters for the sampling algorithms: 2012/2014/2017 103

Table 6-9. Comparison of aggregated county-level indirect and direct estimates for Literacy P1, by subgroup: 2012/2014/2017 112

Figure 1-1. Number of completed cases by states with at least one completed case, sorted by number of completed cases: 2012/2014/2017 20

Figure 4-1. Covariate selection process diagram 43

Figure 5-1. Illustration of county-to-county comparison: 2012/2014/2017 74

Figure 6-1. Residual plots for the first set of residuals: 2012/2014/2017 82

Figure 6-2. Residual plots for the second set of residuals (conditional on the random effects components): 2012/2014/2017 83

Figure 6-3. Posterior means and standard deviations for the regression coefficients under HB models with LKJ prior on the correlation matrix versus LKJ prior on the Cholesky factor... 88

Figure 6-4. Posterior means and standard deviations for county-level literacy proportions under HB models with LKJ prior on the correlation matrix versus LKJ prior on the Cholesky factor... 89

Figure 6-5. Posterior means and standard deviations for the regression coefficients under HB models with LKJ prior on the correlation matrix versus IW prior on the variance matrix:... 92

Figure 6-6. Posterior means and standard deviations for county-level literacy proportions under HB models with LKJ prior on the correlation matrix versus IW prior on the variance matrix:... 93

Figure 6-7. Posterior means and standard deviations for the regression coefficients under HB models with IW prior on the variance matrix and different sets of initial values: 2012/2014/2017 95

Figure 6-8. Posterior means and standard deviations for county-level literacy proportions under HB models with IW prior on the variance matrix and different sets of initial values:... 96

Figure 6-9. Posterior means and standard deviations for the regression coefficients under HB models with IW prior on the variance matrix and different sets of hyperparameters:... 98

Figure 6-10. Posterior means and standard deviations for county-level literacy proportions under HB models with IW prior on the variance matrix and different sets of hyperparameters:... 99

Figure 6-11. Posterior means and standard deviations for regression coefficients under univariate and bivariate HB models: 2012/2014/2017 101

Figure 6-12. Posterior means and standard deviations for county-level literacy proportions under univariate and bivariate HB models: 2012/2014/2017 102

Figure 6-13. Literacy proportion-Histograms of differences between survey regression estimates and indirect estimates: 2012/2014/2017 106

Figure 6-14. Literacy proportion-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 107

Figure 6-15. Literacy proportion-Indication of coverage by credible interval: 2012/2014/2017 108

Figure 6-16. Literacy proportion-Comparison between survey regression estimates and indirect estimates: 2012/2014/2017 109

Figure 6-17. Literacy proportion-Comparison between model standard errors and smoothed standard errors: 2012/2014/2017 110

Table A-1. List of county-level variables, by source and year 123

Table A-2. List of state-level variables, by source and year 128

Table A-3. PIAAC county- and state-level variable correlations with literacy/numeracy proficiency outcomes: 2012/2014/2017 130

Table A-4. PIAAC county- and state-level variable LASSO selection results with literacy/numeracy proficiency outcomes: 2012/2014/2017 135

Table B-1. Ratio of the estimated true sampling variance of the PSU-level survey regression estimate to the average direct estimate for the simulation PSUs 145

Table B-2. Features of the models studied, comparing unmatched, "x" (vs. matched, "."), use of the estimated true variance, "x" (vs. smoothed "."), STAN, "x" (vs. JAGS, "."), use of the... 147

Table B-3. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, averaged over all PSUs and for groups of PSUs,... 149

Table B-4. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, for groups of PSUs, classified by the percentage... 150

Table B-5. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, for groups of PSUs, classified by the modeled... 151

Table B-6. Average mean square errors (×10⁴) based on 500 simulated samples under different HB area-level models for low education, state-level estimates 153

Table B-7. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 155

Table B-8. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 156

Table B-9. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 157

Table B-10. Average percentage of coverage of 95 percent credible intervals produced by MCMC based on 500 simulated samples under different HB area-level models for low education,... 158

Table C-1. Different sets of covariates used in cross validation for literacy average: 2012/2014/2017 180

Table C-2. Different sets of covariates used in cross validation for numeracy proportions: 2012/2014/2017 181

Table C-3. Different sets of covariates used in cross validation for numeracy average: 2012/2014/2017 181

Table C-4. Distribution of credible interval widths and coefficients of variation for small area estimates: 2012/2014/2017 182

Table C-5. Evaluation of aggregate estimates for literacy proportion at Level 2: 2012/2014/2017 200

Table C-6. Evaluation of aggregate estimates for literacy proportion at or above Level 3: 2012/2014/2017 203

Table C-7. Evaluation of aggregate estimates for literacy average: 2012/2014/2017 206

Table C-8. Evaluation of aggregate estimates for numeracy proportion at or below Level 1: 2012/2014/2017 209

Table C-9. Evaluation of aggregate estimates for numeracy proportion at Level 2: 2012/2014/2017 212

Table C-10. Evaluation of aggregate estimates for numeracy proportion at or above Level 3: 2012/2014/2017 215

Table C-11. Evaluation of aggregate estimates for numeracy average: 2012/2014/2017 218

Table D-1. Counties with negative indirect estimates for literacy proportion at or above Level 3: 2012/2014/2017 222

Table D-2. Counties with negative indirect estimates for numeracy proportion at or above Level 3: 2012/2014/2017 222

Table D-3. Counties with positive indirect estimate and negative lower bound of credible intervals for literacy proportion at or below Level 1: 2012/2014/2017 223

Table D-4. Counties with positive indirect estimate and negative lower bound of credible intervals for literacy proportion at or above Level 3: 2012/2014/2017 223

Table D-5. Counties with positive indirect estimates and negative lower bound of credible intervals for numeracy proportion at or above Level 3: 2012/2014/2017 223

Figure B-1. Comparison of estimated true variances of the direct and survey regression estimates for the 1,234 PSUs in the simulation 144

Figure B-2. Comparison of low education vs. the predictions under the matched model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 158

Figure B-3. Comparison of low education vs. the predictions under the unmatched model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 159

Figure B-4. Comparison of low education vs. the predictions of a linear unit-level model using only the fixed effects for the 1,234 PSUs in the simulation, based on the entire ACS population... 160

Figure B-5. Comparison of average MSE in predicting low education, 100 simulations of M13 with inverse Wishart vs. LKJ priors 161

Figure B-6. Comparison of average posterior variance in predicting low education, 100 simulations of M13 with inverse Wishart vs. LKJ priors 162

Figure B-7. Comparison of actual average MSE to the average posterior variance in predicting low education, 100 simulations of M13 with inverse Wishart prior 163

Figure B-8. Comparison of actual average MSE to the average posterior variance in predicting low education, 100 simulations of M13 with LKJ prior 163

Figure B-9. Comparison of average posterior variance in predicting low education, 20 simulations of M13, LKJ prior with default settings vs. LKJ with the final settings 164

Figure B-10. Comparison of average posterior variance in predicting low education, 20 simulations of M13, LKJ prior with default settings vs. LKJ with the final settings, with a revised... 165

Figure C-1. Small area estimates and credible intervals for states, by outcome: 2012/2014/2017 172

Figure C-2. Numeracy proportion-Histogram of differences between survey regression estimates and indirect estimates: 2012/2014/2017 185

Figure C-3. Numeracy proportion-Comparison between survey regression estimates and indirect estimates: 2012/2014/2017 186

Figure C-4. Numeracy proportion-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 187

Figure C-5. Numeracy proportion-Indication of coverage by credible interval: 2012/2014/2017 188

Figure C-6. Numeracy proportion-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 189

Figure C-7. Literacy average-Histogram of differences between SRE and indirect estimates: 2012/2014/2017 190

Figure C-8. Literacy average-Scatterplot of SRE and indirect estimates, with sample size as bubbles: 2012/2014/2017 191

Figure C-9. Literacy average-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 192

Figure C-10. Literacy average-Indication of coverage by credible interval: 2012/2014/2017 193

Figure C-11. Literacy average-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 194

Figure C-12. Numeracy average-Histogram of differences between SRE and indirect estimates: 2012/2014/2017 195

Figure C-13. Numeracy average-Scatterplot of SRE and indirect estimates, with sample size as bubbles: 2012/2014/2017 196

Figure C-14. Numeracy average-Shrinkage plots of point estimates, by sample size: 2012/2014/2017 197

Figure C-15. Numeracy average-Indication of coverage by credible interval: 2012/2014/2017 198

Figure C-16. Numeracy average-Comparison of standard errors between model and smoothed approaches: 2012/2014/2017 199