대한민국 국회도서관

인명/단체명 검색결과
전체 선택	대표형(전거형, Authority)	생물정보	이형(異形, Variant)	소속	직위	직업	활동분야	주기	서지
연구/단체명을 입력해주세요.

소장자료
공공정책정보
외부기관 자료

학위논문 중도절단 비율이 높은 생존자료분석 시, 예측력 향상을 위한 재표본 방법들의 비교 = Comparison of re-sampling methods to improve prediction in analysis of highly censored survival data

저자명
최재윤
발행사항
서울 : 고려대학교 대학원, 2022.2
청구기호
TM 610.15195 -22-6
형태사항
x, 88 p. ; 26 cm
자료실 전자자료
제어번호
KDMT12022000037770
주기사항
학위논문(석사) -- 고려대학교 대학원, 의학통계학협동과정, 2022.2. 지도교수: 안형진
원문
협정기관
연계정보
외부기관 원문
학술연구정보서비스(KERIS)
외부기관 원문

목차보기

표제지

Abstract

Ⅰ. 서론 14

Ⅱ. 연구 방법 17

1. 생존자료분석 17

1) 생존자료와 중도절단 17

2) Cox의 비례위험모형 17

2. 오버샘플링(Oversampling) 19

1) 랜덤 오버샘플링(Random oversampling) 19

2) SMOTE(Synthetic Minority Oversampling Technique) 19

3) Borderline SMOTE 21

4) ADASYN(Adaptive Synthetic sampling) 22

3. 언더샘플링(Undersampling) 23

1) 랜덤 언더샘플링(Random undersampling) 23

2) Edited Nearest Neighbor 24

3) Tomek link 25

4) One Sided Selection 26

Ⅲ. 모의실험 27

1. 모의실험 목적 27

2. 모의실험 설계 28

3. 모의실험 결과의 평가 30

1) 편향(Bias) 30

2) C-index 30

3) iAUC(integrated Area Under Curve) 31

4. 모의실험 결과 33

1) 편향 34

2) C-index 51

3) iAUC 73

Ⅳ. 고찰 95

Ⅴ. 결론 97

참고문헌 98

국문요약 100

표목차

Table 1. Simulation scenarios 29

Table 2. Bias of estimated coefficients in Cox proportional model in simulation 1 (n＝250) 35

Table 3. Bias of estimated coefficients in Cox proportional model in simulation 1 (n＝500) 36

Table 4. Bias of estimated coefficients in Cox proportional model in simulation 1 (n＝1,000) 37

Table 5. Bias of estimated coefficients in Cox proportional model in simulation 1 (n＝5,000) 38

Table 6. Bias of estimated coefficients in Cox proportional model in simulation 2 (n＝250) 39

Table 7. Bias of estimated coefficients in Cox proportional model in simulation 2 (n＝500) 40

Table 8. Bias of estimated coefficients in Cox proportional model in simulation 2 (n＝1,000) 41

Table 9. Bias of estimated coefficients in Cox proportional model in simulation 2 (n＝5,000) 42

Table 10. Bias of estimated coefficients in Cox proportional model in simulation 3 (n＝250) 43

Table 11. Bias of estimated coefficients in Cox proportional model in simulation 3 (n＝500) 45

Table 12. Bias of estimated coefficients in Cox proportional model in simulation 3 (n＝1,000) 47

Table 13. Bias of estimated coefficients in Cox proportional model in simulation 3 (n＝5,000) 49

Table 14. C-index of Cox proportional hazard model with oversampling (n＝250) 53

Table 15. C-index of Cox proportional hazard model with undersampling (n＝250) 54

Table 16. C-index of Cox proportional hazard model with oversampling (n＝500) 55

Table 17. C-index of Cox proportional hazard model with undersampling (n＝500) 56

Table 18. C-index of Cox proportional hazard model with oversampling (n＝1,000) 57

Table 19. C-index of Cox proportional hazard model with undersampling (n＝1,000) 58

Table 20. C-index of Cox proportional hazard model with oversampling (n＝5,000) 59

Table 21. C-index of Cox proportional hazard model with undersampling (n＝5,000) 60

Table 22. integrated AUC of Cox proportional hazard model with oversampling (n＝250) 75

Table 23. integrated AUC of Cox proportional hazard model with undersampling (n＝250) 76

Table 24. integrated AUC of Cox proportional hazard model with oversampling (n＝500) 77

Table 25. integrated AUC of Cox proportional hazard model with undersampling (n＝500) 78

Table 26. integrated AUC of Cox proportional hazard model with oversampling (n＝1,000) 79

Table 27. integrated AUC of Cox proportional hazard model with undersampling (n＝1,000) 80

Table 28. integrated AUC of Cox proportional hazard model with oversampling (n＝5,000) 81

Table 29. integrated AUC of Cox proportional hazard model with undersampling (n＝5000) 82

그림목차

Figure 1. Random oversampling 19

Figure 2. SMOTE 20

Figure 3. Borderline SMOTE 22

Figure 4. Random undersampling 24

Figure 5. Edited Nearest Neighbor 25

Figure 6. Tomek link 26

Figure 7. One Sided Selection 26

Figure 8. C-index of Cox proportional hazard model in scenario 1 (n＝250) 61

Figure 9. C-index of Cox proportional hazard model in scenario 2 (n＝250) 62

Figure 10. C-index of Cox proportional hazard model in scenario 3 (n＝250) 63

Figure 11. C-index of Cox proportional hazard model in scenario 1 (n＝500) 64

Figure 12. C-index of Cox proportional hazard model in scenario 2 (n＝500) 65

Figure 13. C-index of Cox proportional hazard model in scenario 3 (n＝500) 66

Figure 14. C-index of Cox proportional hazard model in scenario 1 (n＝1,000) 67

Figure 15. C-index of Cox proportional hazard model in scenario 2 (n＝1,000) 68

Figure 16. C-index of Cox proportional hazard model in scenario 3 (n＝1,000) 69

Figure 17. C-index of Cox proportional hazard model in scenario 1 (n＝5,000) 70

Figure 18. C-index of Cox proportional hazard model in scenario 2 (n＝5,000) 71

Figure 19. C-index of Cox proportional hazard model in scenario 3 (n＝5,000) 72

Figure 20. integrated AUC of Cox proportional hazard model in scenario 1 (n＝250) 83

Figure 21. integrated AUC of Cox proportional hazard model in scenario 2 (n＝250) 84

Figure 22. integrated AUC of Cox proportional hazard model in scenario 3 (n＝250) 85

Figure 23. integrated AUC of Cox proportional hazard model in scenario 1 (n＝500) 86

Figure 24. integrated AUC of Cox proportional hazard model in scenario 2 (n＝500) 87

Figure 25. integrated AUC of Cox proportional hazard model in scenario 3 (n＝500) 88

Figure 26. integrated AUC of Cox proportional hazard model in scenario 1 (n＝1,000) 89

Figure 27. integrated AUC of Cox proportional hazard model in scenario 2 (n＝1,000) 90

Figure 28. integrated AUC of Cox proportional hazard model in scenario 3 (n＝1,000) 91

Figure 29. integrated AUC of Cox proportional hazard model in scenario 1 (n＝5,000) 92

Figure 30. integrated AUC of Cox proportional hazard model in scenario 2 (n＝5,000) 93

Figure 31. integrated AUC of Cox proportional hazard model in scenario 3 (n＝5,000) 94

초록보기

목적: 반응변수의 계급이 불균형인 자료는 현실에서 쉽게 찾을 수 있다. 계급 불균형 자료에 대해 예측을 할 때 소수 계급에 대한 예측력은 떨어지는 문제가 발생한다. 이를 해결하기 위해 기존의 데이터를 표본추출 방법을 통해 불균형 문제를 해결하는 재표본 방법이 있다. 재표본 방법에는 소수 계급의 데이터를 늘리는 오버샘플링과 다수 계급의 데이터를 줄이는 언더샘플링 방법이 있다. 생존자료에서는 중도절단된 자료로 인해 환자의 위험 요인을 파악하고 위험을 예측하는데 어려움이 따른다. 따라서 본 논문은 중도절단 여부를 계급으로 간주하여 재표본 방법을 사용했을 때 중도절단 비율이 높은 경우의 예측력이 향상되는지 평가하고자 한다.

방법: 모의실험을 통해 다양한 중도절단 비율에 따른 생존자료를 생성한 후, 중도절단 여부를 계급으로 간주하여 재표본 방법을 사용한다. 오버샘플링 방법으로 랜덤 오버샘플링, SMOTE, Borderline SMOTE, ADASYN을 사용하고 언더샘플링 방법으로 랜덤 언더샘플링, Edited Nearest Neighbor, Tomek link, One Sided Selection을 사용한다. 재표본 방법을 사용한 후, Cox의 비례위험 모형에 적합하여 먼저 추정된 계수의 편향을 평가한다. 최종으로 각 재표본 방법에 따른 Cox 비례위험 모형의 예측 성능을 C-index와 iAUC를 통해 종합적으로 평가한다.

결과: 중도절단 비율에 따른 생존자료에서 재표본 방법을 사용하여 예측력을 평가한 모의실험 결과, 중도절단 비율이 30%인 경우에는 재표본 방법을 사용하지 않은 경우가 사용한 경우에 비해 예측력이 높았다. 중도절단 비율이 70%와 90%인 경우, 오버샘플링 방법은 재표본 방법을 사용하지 않은 경우에 비해 전반적으로 예측력이 떨어졌으나 랜덤 언더샘플링을 제외한 언더샘플링 방법들은 차이는 미미하였으나 더 나은 예측 성능을 보였다.

결론: 희귀질환과 같이 관심 사건이 드물게 발생하는 잘 예측하는 것이 중요하다. 이를 위해 중도절단 비율이 높은 경우, 재표본 방법을 사용하여 예측력을 향상시킬 수 있는지를 고려할 수 있다. 모의실험 결과, 예측력 향상의 정도가 크진 않았지만 조금이라도 예측력을 높여야 하는 경우, 재표본 방법은 적절한 모델링 방법과 함께 사용한다면 좋은 대안이 될 수 있다.

자료명
저자사항
제어번호
*요청자 이름	회신요청
*전화번호	휴대폰 번호를 입력하세요.
*이메일	@
*요청내용
*오류항목

* 서재명
설명
* 공개수준	비공개 완전공개 * 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.

알림톡 발송로 자료명, 기사명/저자명, 수록지명, 자료실, 서가번호, 전화번호로 구성되어 있습니다.




*전화번호	※ '-' 없이 휴대폰번호를 입력하세요

연속간행물 상세정보 입니다.
청구기호
자료명/저자사항
발행사항
형태사항
ISSN

다국어입력

상세검색

다국어입력

저자 검색

관련 키워드 검색

주제별 검색

학위논문 중도절단 비율이 높은 생존자료분석 시, 예측력 향상을 위한 재표본 방법들의 비교 = Comparison of re-sampling methods to improve prediction in analysis of highly censored survival data

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

MARC 보기

오류 데이터 정정요청

알림톡 발송

권호기사보기

연속간행물 권호 선택

연속간행물 권호 선택

우편복사 안내

도서위치안내(서울관)

저자프로필

목차보기

우편복사 안내

우편복사 목록담기

확인

내서재에 담기

새로운 서재

저장

로그인