대한민국 국회도서관

인명/단체명 검색결과
전체 선택	대표형(전거형, Authority)	생물정보	이형(異形, Variant)	소속	직위	직업	활동분야	주기	서지
연구/단체명을 입력해주세요.

소장자료
공공정책정보
외부기관 자료

학위논문 Heterogeneity-aware resource management techniques for data-intensive applications

저자명
Myeonggyun Han
발행사항
울산 : 울산과학기술원 대학원, 2024.2
청구기호
TD 621.39 -24-261
형태사항
113 p. ; 30 cm
자료실 전자자료
제어번호
KDMT12024000039234
주기사항
학위논문(박사) -- 울산과학기술원 대학원, Dept. of Computer Science and Engineering, 2024.2. 지도교수: Woongki Baek
원문
연계정보
외부기관 원문
학술연구정보서비스(KERIS)
외부기관 원문

목차보기

Title Page

Abstract

Contents

Ⅰ. Introduction 15

1.1. Contributions 16

1.2. Organization 17

Ⅱ. Background 18

2.1. Heterogeneous Embedded Systems and Inference 18

2.2. Heterogeneous Memory Systems 19

Ⅲ. Related Work 21

3.1. Model Slicing and Execution for Efficient Deep Learning Inference 21

3.2. Data Placement and Migration for High-Performance Deep Learning 22

3.3. Resource Management for QoS-Aware and Efficient Workload Consolidation 23

Ⅳ. Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference 25

4.1. Introduction 25

4.2. Experimental Methodology 26

4.3. Need for Heterogeneity-, Communication-, and Constraint-Aware Inference 27

4.4. Design and Implementation 30

4.4.1. Inference Workload Profiler 30

4.4.2. Execution and Communication Cost Estimators 31

4.4.3. Model Slicer and Scheduler 33

4.4.4. Inference Workload Executor 36

4.5. Evaluation 36

4.5.1. Overview 36

4.5.2. Inference Latency 37

4.5.3. Inference Energy 38

4.5.4. Impact of the MOSAIC Components 39

4.5.5. Discussion 41

4.6. Summary 43

Ⅴ. Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems 44

5.1. Introduction 44

5.2. Background: Deep Q-Network 45

5.3. Design and Implementation 47

5.3.1. Profiler 47

5.3.2. Execution and Communication Cost Estimators 47

5.3.3. Model Slicing and Execution Planner 49

5.3.4. Runtime System 53

5.4. Experimental Methodology 53

5.5. Evaluation 54

5.5.1. Overview 54

5.5.2. Inference Latency and Energy Efficiency 55

5.5.3. Sensitivity 58

5.5.4. Generality 60

5.5.5. Energy-Delay Product Efficiency 60

5.5.6. Training Time 61

5.6. Summary 62

Ⅵ. Hotness-and Lifetime-Aware Data Placement and Migration for High-Performance Deep-Learning on Heterogeneous Memory Systems 63

6.1. Introduction 63

6.2. Background 64

6.2.1. Tensor Flow Machine-Learning System 64

6.2.2. NUMA-Aware Memory Policies 65

6.2.3. Heterogeneous Memory Systems 66

6.2.4. Terminology 66

6.3. Experimental Methodology 67

6.3.1. System Configuration 67

6.3.2. Deep-Learning Applications 67

6.4. Characterization of DL Applications 68

6.4.1. Execution Time Characteristics 68

6.4.2. Tensor Characteristics 70

6.5. Design and Implementation 72

6.5.1. Tensor Hotness Analyzer 72

6.5.2. Tensor Lifetime Analyzer 73

6.5.3. Tensor Combiner 74

6.5.4. Tensor Manager 75

6.5.5. Discussion 78

6.6. Evaluation 78

6.6.1. Performance and Energy Results 79

6.6.2. Performance Overheads 83

6.6.3. Performance Sensitivity 83

6.6.4. Impact of the Optimization Techniques 84

6.7. Summary 85

Ⅶ. Coordinated Management of Cores, Memory, and Compressed Memory Swap for QoS-Aware and Efficient Workload Consolidation for Memory-Intensive Applications 86

7.1. Introduction 86

7.2. Background 88

7.2.1. Memory Reclaim and CMS 88

7.2.2. Workload Consolidation 88

7.3. Experimental Methodology 89

7.3.1. System Configuration 89

7.3.2. Benchmarks 90

7.4. Characterization 91

7.5. Design and Implementation 96

7.5.1. Profiler 96

7.5.2. System State Space Explorer 97

7.5.3. Resource Allocator 100

7.6. Evaluation 101

7.6.1. QoS and Throughput 101

7.6.2. Sensitivity 104

7.6.3. Explored System States 106

7.6.4. Dynamic Resource Management 106

7.7. Summary 107

Ⅷ. Conclusion 108

References 110

List of Tables

Table 1. Evaluated deep-learning inference workloads 27

Table 2. Voltage and frequency levels of the evaluated computing devices 33

Table 3. Model slicing and execution plans for performance optimization 37

Table 4. Model slicing and execution plans for energy optimization 39

Table 5. Tunable hyper-parameters 52

Table 6. Evaluated real-time inference workloads 53

Table 7. Model slicing and execution plans 57

Table 8. System specification 67

Table 9. Evaluated deep-learning applications 67

Table 10. Loads for the LC benchmarks 90

Table 11. Working-set sizes 91

Table 12. Evaluated workload mixes 101

List of Figures

Figure 1. Hardware and software stacks for deep-learning inference on heterogeneous embedded systems. 18

Figure 2. Evaluated heterogeneous embedded system and power monitor 27

Figure 3. Performance heterogeneity of inference workloads 28

Figure 4. Energy heterogeneity of inference workloads 29

Figure 5. Communication overheads 30

Figure 6. Overall architecture of MOSAIC 30

Figure 7. Communication time with various tensor sizes 31

Figure 8. Inference latency 37

Figure 9. Inference energy 38

Figure 10. Latency impact of the MOSAIC components 40

Figure 11. Energy impact of the MOSAIC components 40

Figure 12. Inference latency with smaller models 41

Figure 13. Inference energy with smaller models 41

Figure 14. Latency estimation accuracy 42

Figure 15. Energy estimation accuracy 42

Figure 16. Overheads for performance optimization 43

Figure 17. Overheads for energy optimization 43

Figure 18. Overall architecture of HERTI 46

Figure 19. DQN architecture of MSEP 50

Figure 20. Inference latency 56

Figure 21. Inference energy 57

Figure 22. Sensitivity to the inference deadline 58

Figure 23. Sensitivity to the system heterogeneity 59

Figure 24. Generality of HERTI 59

Figure 25. Energy-delay product 60

Figure 26. Training time comparison 61

Figure 27. Networks with linear and non-linear connections 65

Figure 28. Per-operation execution time of VGG 69

Figure 29. Execution time breakdowns 70

Figure 30. Per-operation execution time of GN 71

Figure 31. Tensor characteristics of VGG 72

Figure 32. Tensor characteristics of GN 73

Figure 33. Overall architecture of HALO 74

Figure 34. Overall performance results 79

Figure 35. Execution breakdowns with HALO and various memory management policies 80

Figure 36. Memory traffic 81

Figure 37. Energy consumption breakdowns 82

Figure 38. Performance overheads of HALO 83

Figure 39. Sensitivity to the application working-set size 84

Figure 40. Impact of the optimization techniques 84

Figure 41. Impact of cores, memory, and CMS allocated to the LC container with low load and low MOR 92

Figure 42. Impact of cores, memory, and CMS allocated to the LC container with low load and high MOR 93

Figure 43. Impact of cores, memory, and CMS allocated to the LC container with high load and low MOR 94

Figure 44. Impact of cores, memory, and CMS allocated to the LC container with high load and high MOR 95

Figure 45. Overall architecture of COSMOS 96

Figure 46. Execution flow of the system state space explorer 97

Figure 47. Quality of service 102

Figure 48. Effective machine utilization 103

Figure 49. Sensitivity to the memory overcommit ratio 104

Figure 50. Sensitivity to the load for the LC container 105

Figure 51. Sensitivity to the load and memory overcommit ratio 105

Figure 52. Number of the explored system states 106

Figure 53. Effectiveness of dynamic resource management 107

초록보기

A wide range of applications have become data-intensive as they operate on the massive amounts of data generated by social network services, multimedia devices, and Internet of Things sensors. These data-intensive applications typically require enormous computational and memory resources to extract useful information from the massive amounts of data they encounter. To accommodate the enormous computing and memory demands of data-intensive applications, hardware resources in computing systems are becoming highly heterogeneous. Specifically, numerous hardware accelerators, such as tensor processing units (TPUs) and neural processing units (NPUs), have been developed to address the ever-increasing computing demands of deep-learning applications. In addition, new memory devices, such as high-bandwidth memory (HBM) and non-volatile memory (NVM), have been developed to tackle the growing demand for increased memory performance, capacity, and cost-efficiency.

Heterogeneous computing and memory have great potential to significantly improve the performance and efficiency of data-intensive applications. However, taking full advantage of the capabilities of heterogeneous computing and memory poses significant challenges to system software in that it is the responsibility of the underlying system software to manage the heterogeneous computing and memory resources effectively so as to maximize the metric of interest, such as the performance or energy efficiency. This dissertation presents heterogeneity-aware resource management techniques that significantly improve the performance and efficiency of data-intensive applications by effectively exploiting heterogeneous computing and memory resources.

First, we investigate system software techniques that effectively schedule computations on heterogeneous computing devices for efficient deep-learning inference. To this end, we propose MOSAIC, a software-based system for heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference on heterogeneous embedded systems. MOSAIC employs accurate models for estimating the execution and communication costs of the target inference workload. MOSAIC generates an efficient model slicing and execution plan for the target workload using an algorithm based on dynamic programming.

Second, we propose HERTI, a reinforcement learning-augmented system for efficient real-time inference on heterogeneous embedded systems. HERTI efficiently explores the state space and robustly finds an efficient state that significantly improves the efficiency of the target inference workload while satisfying the corresponding deadline constraint through reinforcement learning. In addition, HERTI significantly accelerates the training process based on the accurate and lightweight cost estimators.

Third, we investigate a system software technique that effectively manages heterogeneous memory for high-performance deep-learning. We analyze the characteristics of representative deep-learning workloads on a real heterogeneous memory system. Guided by the characterization results, we propose HALO, hotness- and lifetime-aware data placement and migration for high-performance deep-learning on heterogeneous memory systems. HALO extracts the hotness and lifetime information on the tensors of the target deep-learning application based on its dataflow graph. HALO then dynamically places and migrates the tensors on heterogeneous memory nodes based on their hotness and lifetime characteristics.

Finally, we investigate a system software technique for QoS-aware and efficient workload consolidation on heterogeneous memory systems based on software-defined far memory. We conduct an in-depth characterization of the impact of cores, memory, and compressed memory swap (CMS) on the QoS and throughput of consolidated latency-critical (LC) and batch applications. Guided by the characterization results, we propose COSMOS, a software-based runtime system for the coordinated management of cores, memory, and CMS for QoS-aware and efficient workload consolidation for memory-intensive applications. COSMOS dynamically collects runtime data from consolidated applications and the underlying system and allocates the resources to the consolidated applications in a way that achieves high throughput with strong QoS guarantees.

자료명
저자사항
제어번호
*요청자 이름	회신요청
*전화번호	휴대폰 번호를 입력하세요.
*이메일	@
*요청내용
*오류항목

* 서재명
설명
* 공개수준	비공개 완전공개 * 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.

알림톡 발송로 자료명, 기사명/저자명, 수록지명, 자료실, 서가번호, 전화번호로 구성되어 있습니다.




*전화번호	※ '-' 없이 휴대폰번호를 입력하세요

연속간행물 상세정보 입니다.
청구기호
자료명/저자사항
발행사항
형태사항
ISSN

다국어입력

상세검색

다국어입력

저자 검색

관련 키워드 검색

주제별 검색

학위논문 Heterogeneity-aware resource management techniques for data-intensive applications

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

MARC 보기

오류 데이터 정정요청

알림톡 발송

권호기사보기

연속간행물 권호 선택

연속간행물 권호 선택

우편복사 안내

도서위치안내(서울관)

저자프로필

목차보기

우편복사 안내

우편복사 목록담기

확인

내서재에 담기

새로운 서재

저장

로그인