표제지
목차
국문초록 9
ABSTRACT 10
I. 서론 11
II. 연구 대상 13
2.1. 분석 자료 13
2.1.1. 자료소개 13
2.1.2. 데이터 전처리 16
2.2. 평가지표 18
2.2.1. 평균 제곱 오차 18
2.2.2. 일치 지수 18
2.2.3. 수정결정계수 19
2.2.4. 정밀도 재현율 곡선 20
III. 연구 방법 22
3.1. 합성곱 신경망 22
3.2. 앙상블 학습 24
3.3. 이차계획법 25
3.4. 표현학습 26
3.5. 자기지도학습 27
IV. 성능 향상 방법 28
4.1. 이차계획법 모델 28
4.2. 자기지도학습 모델 30
V. 실험 결과 32
5.1. 이차계획법 실험 결과 32
5.2. 자기지도학습 실험 결과 34
VI. 결론 및 제언 39
참고문헌 40
Table 2.1. Data set 13
Table 2.2. Descriptive statistics of SMILES, sequence and binding affinity 15
Table 5.1. Comparison of models via 3-fold and 5-fold cross validations 33
Table 5.2. Comparison of self-supervised learning and semi-supervised learning models via 3-fold cross validations 35
Table 5.3. Comparison of self-supervised learning and semi-supervised learning models via 5-fold cross validations 36
Figure 2.1. Histogram of binding affinity values 15
Figure 2.2. Histograms of length of SMILES (a) and length of protein sequence (b) 15
Figure 2.3. Confusion matrix 21
Figure 3.1. CNN architecture and training process 23
Figure 3.2. Structure of representation learning 26
Figure 4.1. Similarity-based CNN model 29
Figure 4.2. Quadratic programming in ensemble learning process 29
Figure 4.3. Self-supervised learning model process 31
Figure 4.4. The overall process 31
Figure 5.1. Boxplot for 3-fold cross validation 37
Figure 5.2. Boxplot for 5-fold cross validation 38