본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

목차보기

Title Page

Abstract

Contents

1. Introduction 9

2. Related Works 9

2.1. Teacher-Student Learning 9

2.2. Orthogonal Initialization and Orthogonal Regularization 10

3. Problem Setup 11

3.1. Alignment 12

4. Negative Alignment 12

4.1. Exploratory Experiments 12

4.2. On Spurious Local Minima 15

5. Orthogonal Regularization 16

5.1. Orthogonal Initialization 16

5.2. Orthogonal Regularization 18

6. Conclusion 21

References 23

국문초록 24

List of Tables

Table 1. Teacher-Student Random Initialization: Frequency of Negative Alignment Occurrences. The number of experiments exhibiting various counts of... 14

Table 2. Teacher Orthogonal - Student Random Initialization: Incidence of Negative Alignment. This presents the frequency of negative alignment occur-... 18

Table 3. Teacher-Student Orthogonal Initialization: Analysis of Negative Alignment Frequency. This table details the occurrences of negative align-... 19

Table 4. Training loss when the regularization coefficient λ is set to 1. 20

Table 5. Training loss when the regularization coefficient λ is set to 1. 20

Table 6. Training Outcomes with Orthogonal Regularization. Upper Section: Results when the regularization coefficient λ is set to 1. Lower Section: Results... 21

List of Figures

Figure 1. Cosine Similarity Evolution (m=k=12): This graph depicts the progression of cosine similarity between teacher and student weight vectors... 13

Figure 2. Heatmap Visualization of Cosine Similarity (m=k=12): This series of heatmaps displays the cosine values between aligned teacher and... 13

Figure 3. Negative Alignment Cosine Value Curves (m=k=12): This graph illustrates the cosine values of teacher and student weight vectors exhibiting... 14

Figure 4. Heatmaps of Negative-Aligned Cosine Values (m=k=12): These heatmaps present the cosine values of negatively aligned teacher and student... 15

Figure 5. 3D Loss Landscape of Negative-Aligned Vectors (m=k=12): This figure represents a three-dimensional visualization of the loss landscape... 16

Figure 6. Norm Comparison of Aligned and Negative-Aligned Weights: This figure displays the norm values of both aligned and negative-aligned teacher... 17

초록보기

 본 논문에서는 ReLU 활성화 함수를 사용하는 teacher-student 네트워크 학습에서 관찰된 "negative alignment" 라는 새로운 현상을 소개합니다. 일반적으로는 충분한 너비를 가진 student 네트워크는 가중치가 teacher 네트워크와 정렬되는 경향이 있다고 인식되어 왔습니다. 그러나 우리의 주요 발견은 이러한 이해를 도전하며, 특히 네트워크 간 너비가 비슷한 경우에 해당합니다. teacher 가중치를 직교하게 생성하고 student의 가중치를 직교하게 초기화하거나 훈련 중에 직교 정규화를 포함하는 등의 전략을 구현하였음에도 불구하고, negative alignment가 빈번하게 발생했습니다. 훈련된 student 네트워크의 가중치가 teacher의 것과 반대로 정렬되는 이 현상은 특히 네트워크 너비가 유사한 경우에 신경망 훈련 과정에서 이전에 관찰되지 않았던 복잡성을 나타냅니다.