대한민국 국회도서관

인명/단체명 검색결과
전체 선택	대표형(전거형, Authority)	생물정보	이형(異形, Variant)	소속	직위	직업	활동분야	주기	서지
연구/단체명을 입력해주세요.

소장자료
공공정책정보
외부기관 자료

학위논문 Efficient sampling techniques for curriculum learning in cooperative environment = 협력 환경에서의 커리큘럼 학습을 위한 효율적인 샘플링 기법

저자명
Ha, Taegwan
발행사항
광주 : 광주과학기술원, 2023.8
청구기호
TM 620.5 -23-243
형태사항
iv, 29 p. ; 30 cm
자료실 전자자료
제어번호
KDMT12024000010514
주기사항
학위논문(석사) -- 광주과학기술원, School of Integrated Technology, 2023.8. 지도교수: Kyung-Joong Kim
연계정보
원문
외부기관 원문

목차보기

Title Page

Abstract

Contents

Ⅰ. INTRODUCTION 9

Ⅱ. BACKGROUND 11

2.1. Curriculum Learning 11

2.2. Approaches for Solving Unsupervised Environment Design 12

2.3. Overcooked! Game 12

2.4. Proximal Policy Optimization 13

Ⅲ. METHOD 14

3.1. Modified Prioritized Level Replay 14

3.2. Level Scoring Metric for Learning Potential 17

3.2.1. Generalized Advantage Estimate 17

3.2.2. Minimax Return 17

3.2.3. Threshold 18

3.3. Dissimilarity between Levels 18

Ⅳ. EXPERIMENT 20

4.1. Experiment Settings 20

Ⅴ. RESULT 22

5.1. Zero-shot Performance Result 22

5.2. Difficulty of Regret Approximation 24

5.3. Ablation Studies on the Dissimilarity Metric 25

Ⅵ. CONCLUSION 28

REFERENCES 29

APPENDIX 33

A. Additional information about map generation 33

B. Generated map from different generator 34

C. Experiment Detail 35

D. Self-play PPO 36

E. The network structure of PPO agent 37

List of Tables

Table 1. Hyperparameters of map generator 21

List of Figures

Figure 1. Game levels on the online Overecooked game 12

Figure 2. Overview of Modified Prioritized Level Replay. The modified part is highlighted in red 14

Figure 3. Example of hamming distance calculation 19

Figure 4. Samples of the training map with 7×5 size 20

Figure 5. Samples of the training map with 5×5 size 20

Figure 6. Block types of Overcooked map 20

Figure 7. Train and test performance with proposed level score metric 22

Figure 8. Index count of level according to the sampling method 23

Figure 9. Train and test performance with proposed level score metric 23

Figure 10. Mean episodic test rewards on each size of Overcooked map 25

Figure 11. Mean episodic train rewards on big map 25

Figure 12. Index count of level according to the dissimilarity metric 26

Figure 13. Index count of level according to the dissimilarity metric 26

초록보기

Reinforcement learning currently faces challenges in agents exhibit the ability to overfit to training sets and suffer in the generalization from small changes in their environment. To address this issue, recent studies are exploring regret-based curriculum learning approaches to enhance the robustness of the agents. These methods aim to accelerate learning by gradually providing agents with more challenging environments without prior domain knowledge. However, applying regret-based curriculum learning in a cooperative multi-agent setting presents difficulties. Unlike previous curriculum learning setups which single or competitive agent settings, each agent shares the same group reward and must consider the sub-optimal policy of the other agent. This aspect poses difficulties in accurately estimating an agent's regret which approximates the learning potential of the environment. In this paper, we present a suitable sampling method for the cooperation environment by applying environment-diverse metrics that use hamming distance to previous sampling techniques. Following the verification process conducted on the Overcooked environment, the sampling method based on minimizing agents' return demonstrates better zero-shot performance compared to random sampling. Furthermore, the proposed metric to measure the dissimilarity between environments effectively resolves the overfitting of replaying a specific map.

자료명
저자사항
제어번호
*요청자 이름	회신요청
*전화번호	휴대폰 번호를 입력하세요.
*이메일	@
*요청내용
*오류항목

* 서재명
설명
* 공개수준	비공개 완전공개 * 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.

알림톡 발송로 자료명, 기사명/저자명, 수록지명, 자료실, 서가번호, 전화번호로 구성되어 있습니다.




*전화번호	※ '-' 없이 휴대폰번호를 입력하세요

연속간행물 상세정보 입니다.
청구기호
자료명/저자사항
발행사항
형태사항
ISSN

다국어입력

상세검색

다국어입력

저자 검색

관련 키워드 검색

주제별 검색

학위논문 Efficient sampling techniques for curriculum learning in cooperative environment = 협력 환경에서의 커리큘럼 학습을 위한 효율적인 샘플링 기법

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

MARC 보기

오류 데이터 정정요청

알림톡 발송

권호기사보기

연속간행물 권호 선택

연속간행물 권호 선택

우편복사 안내

도서위치안내(서울관)

저자프로필

목차보기

우편복사 안내

우편복사 목록담기

확인

내서재에 담기

새로운 서재

저장

로그인