목차

Title Page

Contents

ABSTRACT 10

Ⅰ. Introduction 13

Ⅱ. background and related work 20

2.1. Definition of Real-time 20

2.2. Activity R ecognition 20

2.3. Skeleton-based Activity Recognition 21

2.3.1. PoseC3D 21

2.4. Multi-modality in Activity Recognition 23

2.4.1. VPN 23

2.4.2. VPN++ 25

2.5. GCN-based Activity Recognition 27

2.5.1. HD-GCN 27

2.5.2. SkeletonGCL 30

Ⅲ. Model Design and Implementation 33

3.1. Preliminaries 33

3.1.1. Knowledge Distillation 34

3.1.2. Skeleton-based Activity Recognition 35

3.2. Object Detection 36

3.2.1. CSPDarknet53 38

3.3. Pose Estimation 41

3.4. Activity Recognition 44

Ⅳ. Experiment 47

4.1. Experimental Settings and Datasets 47

4.1.1. Experimental environment 47

4.1.2. Datasets in Object Detection 47

4.1.3. Datasets in Pose Estimation 48

4.1.4. Datasets in Activity Recognition 50

4.2. Experimental Result 53

4.2.1. Experimental Result of Object detection 53

4.2.2. Experimental Result of Pose estimation 54

4.2.3 Overall Experimental Result 54

4.3. Ablation study 56

4.4. Analysis of Activity Recognition Results 57

Ⅴ. Conclusion 59

References 61

Abstract (in Korean) 65

〈Table 1-1〉 Existing Activity Recognition models Experimental results on NTURGB+D_60 16

〈Table 4-1〉 Train Environment 47

〈Table 4-2〉 Inference Environment 47

〈Table 4-3〉 Object detection Result in MS COCO dataset 53

〈Table 4-4〉 Pose estimation Result in MS COCO key points dataset 54

〈Table 4-5〉 Comparisons of the top-1 accuracy (%) against state-of-the-art methods on the NTU-RGB+D 60 54

〈Table 4-6〉 Experimental results of 60 activities from NTURGB+D under real-time video stream 55

〈Table 4-7〉 Ablation experiments for knowledge distillation 56

〈Figure 1-1〉 Percentage of population aged 65 years or over for the world, SDG regions, and selected groups of countries 14

〈Figure 1-2〉 Number of people aged 60 years or over in Asia and the Pacific and by subregion, 1950, 1990, 2022, 2030 and 2050 14

〈Figure 1-3〉 General architecture of elderly healthcare system 17

〈Figure 2-1〉 overall architecture of PoseC3D 21

〈Figure 2-2〉 data flow of PoseC3D 22

〈Figure 2-3〉 overall architecture of VPN 23

〈Figure 2-4〉 Attention Network and Spatial Embedding 24

〈Figure 2-5〉 VPN-F, VPN-A and VPN++ 26

〈Figure 2-6〉 distillation in VPN-F and VPN-A 26

〈Figure 2-7〉 The architecture of HD-GCN I 28

〈Figure 2-8〉 The architecture of HD-GCN II 28

〈Figure 2-9〉 walking 28

〈Figure 2-10〉 The architecture of SkeletonGCL 30

〈Figure 2-11〉 skeleton of the architecture of SkeletonGCL 31

〈Figure 3-1〉 proposed method RARNet 33

〈Figure 3-2〉 object detection 37

〈Figure 3-3〉 knowledge distillation 37

〈Figure 3-4〉 network of CSPDarknet53 39

〈Figure 3-5〉 CSP 40

〈Figure 3-6〉 PANet 41

〈Figure 3-7〉 pose estimation 42

〈Figure 3-8〉 network of EfficientHRNet 42

〈Figure 3-9〉 SlowFast 45

〈Figure 3-10〉 activity recognition 45

〈Figure 4-1〉 MS COCO key points dataset 49

〈Figure 4-2〉 running ground truth in key point dataset 50

〈Figure 4-3〉 key points in NTURGB+D 52

〈Figure 4-4〉 skeleton data in NTURGB+D 53

〈Figure 4-5〉 sitting down in baseline(left) and RAR-Net(right) 57

〈Figure 4-6〉 standing up in baseline(left) and RAR-Net(right) 57

〈Figure 4-7〉 drink water in baseline(left) and RAR-Net(right) 57

〈Figure 4-8〉 falling in baseline(left) and RAR-Net(right) 58