본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

목차보기

목차 1

유용성과 노출 위험성 지표를 이용한 재현자료 기법 비교 연구 = A comparison of synthetic data approaches using utility and disclosure risk measures / 안성빈 ; 트랑 도안 ; 이주희 ; 김지우 ; 김용재 ; 김윤지 ; 윤창원 ; 정성규 ; 김동하 ; 권성훈 ; 김항준 ; 안정연 ; 박철우 1

Abstract 1

1. 서론 1

2. 재현자료 생성 기법 3

2.1. SURVEY EST 데이터셋 설명 3

2.2. 순차회귀모형을 이용한 재현자료 생성 4

2.3. 비모수 베이지안 모형을 이용한 재현자료 생성 6

2.4. 인공 신경망을 이용한 재현자료 생성 7

2.5. 재현자료 생성 기법의 특징과 차이점 9

3. 재현자료의 평가 지표 10

3.1. 유용성 측도 10

3.2. 재현자료의 노출 위험도 평가 지표 12

3.3. α-정밀도, β-재현율, 독창성 점수 14

3.4. 평가 지표들의 특징과 차이점 15

4. 재현자료 기법들 비교 분석 18

5. 결론 20

Appendix 20

References 23

요약 26

참고문헌 (40건) : 자료제공( 네이버학술정보 )

참고문헌 목록에 대한 테이블로 번호, 참고문헌, 국회도서관 소장유무로 구성되어 있습니다.
번호 참고문헌 국회도서관 소장유무
1 Alaa A, Van Breugel B, Saveliev ES, and van der Schaar M (2022). How faithful is your synthetic data? Samplelevel metrics for evaluating and auditing generative models, International Conference on Machine Learning, 290–306, PMLR. 미소장
2 Arjovsky M, Chintala S, and Bottou L (2017). Wasserstein generative adversarial networks, International Conference on Machine Learning, 214–223, PMLR. 미소장
3 Arthur D and Vassilvitskii S (2007) K-means plus plus: The advantages of careful seeding, In Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, New Orleans, Louisiana, USA, 1027–1035. 미소장
4 Breiman L, Friedman JH, Olshen RA, and Stone CJ (2017). Classification and Regression Trees, Routledge, New York. 미소장
5 Dhariwal P and Nichol A (2021). Diffusion models beat gans on image synthesis, Advances in Neural Information Processing Systems, 34, 8780–8794. 미소장
6 Drechsler J and Reiter JP (2009). Disclosure risk and data utility for partially synthetic data: An empirical study using the german iab establishment survey, Journal of Offcial Statistics, 25, 589–603. 미소장
7 EI Emam K, Mosquera L, and Bass J (2020). Evaluating identity disclosure risk in fully synthetic health data:Model development and validation, Journal of Medical Internet Research, 22, e23139. 미소장
8 Elliot M (2015). Final report on the disclosure risk associated with the synthetic data produced by the sylls team, Report 2015, 2. 미소장
9 Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, and Courville AC (2017). Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems, 30, 1–11. 미소장
10 Hilprecht B, H´arterich M, and Bernau D (2019). Monte carlo and reconstruction membership inference attacks against generative models, Proceedings on Privacy Enhancing Technologies, 2019, 232–249. 미소장
11 Hu J and Savitsky TD (2018). Bayesian data synthesis and disclosure risk quantification: An application to the consumer expenditure surveys, Available from: arXiv preprint arXiv:1809.10074 미소장
12 Ishwaran H and James LF (2001). Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96, 161–173. 미소장
13 Karr AF, Kohnen CN, Oganian A, Reiter JP, and Sanil AP (2006). A framework for evaluating the utility of data altered to protect confidentiality, The American Statistician, 60, 224–232. 미소장
14 Khamis H (2008). Measures of association: How to choose?, Journal of Diagnostic Medical Sonography, 24, 155–162. 미소장
15 Kingma DP andWellingM(2013). Auto-encoding variational Bayes, Available from: arXiv preprint arXiv:1312.6114 미소장
16 Kim HJ, Drechsler J, and Thompson KJ(2021). Synthetic microdata for establishment surveys under informative sampling, Journal of the Royal Statistical Society: Series A, 184, 255–281. 미소장
17 Kim J and Park M-J (2019). Multiple imputation and synthetic data, The Korean Journal of Applied Statistics, 32, 83–97. 미소장
18 Kullback S and Leibler RA (1951). On information and suffciency, The Annals of Mathematical Statistics, 22, 79–86. 미소장
19 Lee Y (2013). Review on statistical methods for protecting privacy and measuring risk of disclosure when releasing information for public use, Journal of the Korean Data and Information Science Society, 24, 1029–1041. 미소장
20 Lin Z, Khetan A, Fanti G, and Oh S (2018). The power of two samples in generative adversarial networks, Advances in Neural Information Processing Systems, 31, 1–10. 미소장
21 Little RJA (1993). Statistical analysis of masked data, Journal of Offcial Statistics, Stockholm, 9, 407–407. 미소장
22 Markus H, Rudolf M, and Andreas E (2020). A baseline for attribute disclosure risk in synthetic data, In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy (CODASPY’20), March 16–18, 2020, New Orleans, LA, USA, ACM, New York, NY, USA, 11, Available from: https://doi.org/10.1145/3374664.3375722 미소장
23 Murray JS and Reiter JP (2016). Multiple imputation of missing categorical and continuous values via bayesian mixture models with local dependence, Journal of the American Statistical Association, 111, 1466–1479. 미소장
24 Nowok B, Raab GM, and Dibben C (2016). Synthpop: Bespoke creation of synthetic data in R, Journal of Statistical Software, 74, 1–26. 미소장
25 Park MJ, Kwon SP, and Shim KH (2013). Microdata masking for Survey of Household Finances and Living Conditions, Statistical Research Institute, Daejeon. 미소장
26 Park M-J, Han J, and Park N (2020). Study on synthetic data generation methods with applications to statistics Korea RDC data, Technical report, Statistical Research Institute. 미소장
27 Raghunathan TE, Reiter JP, and Rubin DB (2003). Multiple imputation for statistical disclosure limitation, Journal of Offcial Statistics, 19, 1–16. 미소장
28 Reiter JP (2003). Inference for partially synthetic, public use microdata sets, Survey Methodology, 29, 181–188. 미소장
29 Reiter JP (2005). Using CART to generate partially synthetic public use microdata, Journal of Offcial Statistics, 21, 441–462. 미소장
30 Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41–55. 미소장
31 Rubin DB (1993). Statistical disclosure limitation, Journal of Offcial Statistics, 9, 461–468. 미소장
32 Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, and Williamson RC (2001). Estimating the support of a highdimensional distribution, Neural Computation, 13, 1443–1471. 미소장
33 Snoke J, Raab GM, Nowok B, Dibben C, and Slavkovic A (2018). General and specific utility measures for synthetic data, Journal of the Royal Statistical Society: Series A, 181, 663–688. 미소장
34 Song Y and Ermon S (2019). Generative modeling by estimating gradients of the data distribution, Advances in Neural Information Processing Systems, 32, 11895–11907. 미소장
35 Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, and Poole B (2020). Score-based generative modeling through stochastic differential equations, International Conference on Learning Representations, Available from: https://arxiv.org/abs/2011.13456 미소장
36 Stan M, Jordi N, Morvarid S, and Tomasz S (2015). A review of attribute disclosure control, Advanced Research in Data Privacy, 567, 41–61. 미소장
37 Villani C (2008). Optimal Transport: Old and New, Springer, New York. 미소장
38 Woo M-J, Reiter JP, Oganian A, and Karr AF (2009). Global measures of data utility for microdata masked for disclosure limitation, Journal of Privacy and Confidentiality, 1, 111–124. 미소장
39 Xu L, Skoularidou M, Cuesta-Infante A, and Veeramachaneni K (2019). Modeling tabular data using conditional GAN, Advances in Neural Information Processing Systems, 32, 7333–7343. 미소장
40 Yoon J, Jarrett D, and Van der SchaarM(2019). Time-series generative adversarial networks, Advances in Neural Information Processing Systems, 32, 5509–5519. 미소장