Title Page
Contents
ABSTRACT 8
Ⅰ. Introduction 10
Ⅱ. Material and Methods 12
2.1. Genome Sequencing 12
2.2. Genome Assembly 12
2.3. Synteny Analysis 16
2.4. Variant Calling 16
2.5. Genome Annotation 18
Ⅲ. Results and Discussion 20
3.1. Genome Sequencing 20
3.2. Genome Assembly 24
3.2.1. De novo Assembly 24
3.2.2. Reference based Scaffolding 32
3.2.3. Gap Filling 38
3.3. Synteny Analysis 42
3.4. Variant Calling 50
3.5. Genome Annotation 61
Ⅳ. CONCLUSION 75
References 77
Appendices 85
[Appendix 1] Output visualization of Google DeepVariants 85
[Appendix 2] Output visualization of PEPPER-Margin DeepVariants 86
[Appendix 3] Variants annotations and putative impacts 87
[Appendix 4] First round of RepeatMasker using Dfam Oryza repeat database 89
Abstract (in Korean) 91
〈Table 1〉 Depth of coverage of single-end and HiFi reads. 24
〈Table 2〉 QUAST analysis report of HIfiasm, Hicanu, Flye assembler output 28
〈Table 3〉 RagTag scaffolding with various reference genomes 34
〈Table 4〉 QUAST analysis output of RagTag scaffolds 35
〈Table 5〉 Gap position of RagTag scaffolding with different references 40
〈Table 6〉 Gap position of RagTag merge scaffolds 41
〈Table 7〉 Synteny analysis of 3 scaffolds made of different reference genomes 43
〈Table 8〉 Synteny analysis of IR64 and Nipponbare based pseudomolecules comparing with R498 based pseudomolecule 48
〈Table 9〉 snpEff result variant calling of GATK4 pipeline 52
〈Table 10〉 GATK4 variants sorted by region and type 53
〈Table 11〉 snpEff result variant calling of DeepVariants pipeline 54
〈Table 12〉 DeepVariants variants sorted by region and type 55
〈Table 13〉 snpEff result variant calling of PEPPER-Margin DeepVariants pipeline 56
〈Table 14〉 PEPPER-Margin DeepVariants variants sorted by region and type 57
〈Table 15〉 Result of hap.py benchmarking with PEPPER-Margin DeepVariants 60
〈Table 16〉 Result of RepeatMasker using Geumgang1 repeatmodeler data 62
〈Table 17〉 Gene list of yield and rice quality related trait 70
〈Table 18〉 Evidence genes selected related to stimulus response 74
〈Figure 1〉 Per base sequence quality and per sequence quality scores using FASTQC (A) Illumina single-end sequencing reads (B) PacBio HiFi... 23
〈Figure 2〉 GenomeScaope K-mer frequency distribution plot of (A) Illumina SE reads and (B) PacBio HiFi reads. 23
〈Figure 3〉 Result of BUSCO analysis of 6 different assembly tools 26
〈Figure 4〉 BUSCO analysis plot of RagTag scaffolds 33
〈Figure 5〉 Bandage plot of (A)Hifiasm contigs and (B)RagTag scaffolds of Geumgang1 genome 37
〈Figure 6〉 Synteny plot of R498 reference genome scaffolds 44
〈Figure 7〉 Synteny plot of IR64 reference genome scaffolds 45
〈Figure 8〉 Synteny plot of Nipponbare reference genome scaffolds 46
〈Figure 9〉 Synteny plot of 3 different reference based scaffolds 49
〈Figure 10〉 HiFi Read Alignment to R498 reference genome 51
〈Figure 11〉 Variants location by genetic regions A) GATK4 variant calling, B) Google DeepVariants, C) PEPPER-Margin DeepVariants 58
〈Figure 12〉 Result of A) HISAT2 RNA alignment, B) Braker2 Gene Prediction 65
〈Figure 13〉 Synteny Dotplot of coding sequence comparing with (A)Nipponbare (B) R498 66
〈Figure 14〉 Macrosynteny plot of coding sequences with 3 different varaieties In our study, we examined the genetic characteristics of the Geumgang1 variety 66
〈Figure 15〉 A) Phylogenetic Tree and B) Local synteny analysis of Ghd7 gene region 69