Title Page
Contents
ABSTRACT 12
Ⅰ. Introduction 15
Ⅱ. Analysis of functional genomic variants for Nanchukmacdon pig 20
2.1. Background 20
2.2. Materials and Methods 22
2.2.1. DNA and RNA sequencing 22
2.2.2. Genome assembly 25
2.2.3. Benchmarking scaffolding approaches 26
2.2.4. Assembly quality assessment 29
2.2.5. Closing gaps in the pig reference genome assembly 29
2.2.6. Gene annotation 30
2.2.7. Repeat annotation 33
2.2.8. Collinearity comparison of the chromosome-level assemblies of pig breeds 33
2.2.9. Variant analysis 34
2.2.10. Integrative multi-omics analysis for NSV flanking genes 34
2.3. Results 36
2.3.1. De novo genome assembly of Nanchukmacdon 36
2.3.2. Genome annotation 41
2.3.3. Benchmarking scaffolding approaches 47
2.3.4. Collinearity comparison with chromosome-level assemblies of pig breeds 49
2.3.5. Identification of structural variants related to Nanchukmacdon-specific phenotypes 55
2.3.6. Regulatory potentials of NSVs affecting the expression of nearby genes 59
Ⅲ. Analysis of Functional genomic variants for minipigs 60
3.1. Background 60
3.2. Materials and Methods 62
3.2.1. Variant calling, evaluation and annotation 62
3.2.2. Between-breed analyses 66
3.2.3. Admixture analyses 66
3.2.4. Within-breed analyses 67
3.2.5. Identification of selective sweeps 67
3.2.6. Differentiation analysis of genes regulating body size 68
3.2.7. Transcriptome analysis 68
3.3. Results 70
3.3.1. SNP identification 70
3.3.2. Genetic structure of the minipig population 74
3.3.3. Demographic signatures of the minipig population 82
3.3.4. Selection signatures by domestication in the minipig population 88
3.3.5. A variance of genomic signatures related to body size in the minipig population 92
3.3.6. Characterization of candidate genes regulating body size of pigs 100
Ⅳ. Development of a full homology-based synteny block detection program 106
4.1. Background 106
4.2. Materials and Methods 109
4.2.1. SHERLOG 109
4.2.2. Definition of a synteny block 109
4.2.3 Input 111
4.2.4. Algorithm 111
4.2.5. Performance evaluation 114
4.2.6. Application to a phased genome assembly 115
4.3. Results 116
4.3.1. Evaluation of SHERLOG in terms of synteny block coverage 116
4.3.2. Evaluation of SHERLOG using homologous genes 120
4.3.3. Evaluation of SHERLOG using chromosome painting results 120
4.3.4. Application of SHERLOG to a phased genome assembly 123
Ⅴ. Development of a web-based application for comparative multi-omics analysis of multiple genomes 125
5.1. Background 125
5.2. Materials and Methods 127
5.2.1. Comparative genomic analysis 127
5.2.2. Differential analysis for omics profile data 127
5.2.3. Implementation 128
5.2.4. Utility assessment 128
5.2.5. Applications 129
5.3. Results 132
5.3.1. Implementation 132
5.3.2. Command line interface 134
5.3.3. Web interface 135
5.3.4. Utility assessment about inter- and intra-species analysis 139
Ⅵ. Discussion 151
References 161
Abstract (in Korean) 179
Table 2-1. Summary of sequencing data of Nanchukmacdon. 23
Table 2-2. Parameter used for RACA run. 27
Table 2-3. The RNA sequencing data used in this study. 35
Table 2-4. Statistics of the NCMD and pig reference genome assembly (Sscrofa11.1). The statistics were calculated using only chromosome-level scaffolds of each assembly. 38
Table 2-5. The list of BUSCO genes found in NCMD annotation, but missed in pig reference (Sscrofa11.1). 42
Table 2-6. Statistics of repetitive elements in the NCMD assembly. 44
Table 2-7. Synteny coverage among pig assemblies. NCMD: Nanchukmacdon, Sscrofa11.1: Duroc, MSCAAS v1: Meishan. 50
Table 2-8. Statistics of structural variants in Nanchukmacdon and Landrace genome. NSV: Nanchukmacdon-specific variant, DEL: Deletion, INS: Insertion, DUP: Duplication, INV: Inversion. 57
Table 3-1. Information of samples used in this study. 63
Table 3-2. Statistics of SNPs of 41 pig breeds and 5 outgroups. ADP: Asian domestic pig, EDP: European domestic pig, AWB: Asian wild boar, EWB: European wild boar. 71
Table 3-3. The genomic distance where LD (Mean r²) dropped to half of its maximum value. ADP: Asian domestic pig, AWB: Asian wild boar, EDP: European domestic pig,... 84
Table 3-4. Results of differential gene expression analyses between minipigs and European domestic pigs in brain, liver and muscle tissue. BM: Bama minipig, KE: ET-... 102
Table 4-1. Comparison of synteny block coverage. Numbers represent the percentage of a genome belonging to synteny blocks. Numbers in parentheses are the percentage of a genome contained in multiple synteny blocks (Chr14: against human chromosome 14, AL: against... 118
Figure 2-1. Length distribution of raw Pacbio subreads of Nanchukmacdon. The mean and median subread lengths were 9.90 Kbp (Red line) and 7.57 Kbp (Blue line), respectively. 24
Figure 2-2. Workflow for the genome assembly of Nanchukmacdon. Grey-colored boxes represent programs, and yellow-, blue-, and red-colored boxes indicate the input... 28
Figure 2-3. Workflow for annotating protein-coding genes. Grey-colored boxes represent programs, and green- and brown-colored boxes respectively indicate their... 32
Figure 2-4. Chromosome-level genome assembly (NCMD) of the Nanchukmacdon pig breed. a Comparison results between NCMD and pig reference genome assembly... 39
Figure 2-5. Genome annotation of NCMD assembly. a Gene annotation statistics for the assemblies of diverse pig breeds. The annotation statistics of 13 pig assemblies except... 46
Figure 2-6. Comparison results between the reference-guided assembly and Hi-C read-based assembly represented by RACA and SALSA2 respectively. a Distributions of... 48
Figure 2-7. Comparison of the NCMD assembly with assemblies of other pig breeds. a Syntenic relationships of Nanchukmacdon chromosomes containing breakpoint regions... 51
Figure 2-8. Mapping patterns of Nanchukmacdon reads at the breakpoint regions of Duroc assembly. The panels show the mapping depths and patterns of paired-end... 52
Figure 2-9. Mapping patterns of Nanchukmacdon reads at the breakpoint regions of Meishan assembly (Same as Figure 2-8) 53
Figure 2-10. Paralogous genes flanking in the breakpoint regions of the Nanchukmacdon against Duroc genome (a) and Meishan genome (b). a Green and red... 54
Figure 2-11. Structural variants (SVs) in Nanchukmacdon and Landrace. a Distribution of SVs in the Nanchukmacdon and Landrace genomes. NSVs represent... 58
Figure 3-1. Geographical distribution of pig breeds and outgroups used in this study. The background map was drawn using ggplot2. Different colored dots indicate... 73
Figure 3-2. PCA result of 216 samples of pig breed samples and outgroups by the first and the second principal component (a), and the second and third principal component... 76
Figure 3-3. Estimated phylogenetic relationships among pig populations and outgroups. Different colors represent different pig breeds and outgroups as shown in the legend. 77
Figure 3-4. Pairwise fixation index (pairwise Fst) scores between different pig breeds. Pairwise fixation index (pairwise Fst) scores between different pig breeds. Clustering... 78
Figure 3-5. Results of admixture analyses with diverse number of K (from two to five). Top boxes with different colors indicate different pig populations and outgroup. 79
Figure 3-6. ƒ₃ statistics of minipig breeds for admixture between Asian and European domestic pigs. Green and yellow color represent MP1 and MP2 sub-population,... 80
Figure 3-7. Results of admixture analysis for minipig breeds. a D statistics obtained by comparing to the Asian (ADP) and European domestic pigs (EDP). Green and yellow... 81
Figure 3-8. Different demographic histories of the eight minipig breeds. a Proportion of total sum of ROH (SROH) in different ROH size ranges. Vertical bars represent standard... 85
Figure 3-9. Distribution of SROH and Number of ROH (NROH) for pig individuals. Different colors represent different minipig breeds and pig populations. Cross marks... 86
Figure 3-10. Average linkage disequilibrium between intrachromosomal regions in Bama (a), Göttingen (b), Mini-LEWE (c), ET-type Korean (d), L-type Korean (e),... 87
Figure 3-11. Results of genome-wide selective sweep analyses for minipig sub-populations against the wild boar (WB) population. a Manhattan plots of Z-transformed... 90
Figure 3-12. Significant biological process terms for selective sweeps of each minipig against European domestic pigs. 93
Figure 3-13. Significant molecular function terms for selective sweeps of each minipig against European domestic pigs. 94
Figure 3-14. Significant cellular component terms for selective sweeps of each minipig against European domestic pigs. 95
Figure 3-15. Significant KEGG pathways for selective sweeps of each minipig against European domestic pigs. 96
Figure 3-16. Results of identifying differentiated genomic regions related to body size in the eight minipigs. a Left and right bar plot show the number of differentiated genomic... 98
Figure 3-17. Genotype plots for 10 Kbp sliding regions containing NR6A1 (a), VRTN (b), AR (c) and LCORL gene (d). Yellow, dark blue, and red color indicate homozygous... 99
Figure 3-18. Statistics of candidate genes involved in body size that are commonly identified in all minipigs compared to European domestic pigs (EDP). a, b Nucleotide... 104
Figure 4-1. Definition of a synteny block. The boxes on each genome and ribbons be-tween them represent pairwise alignments between the two genomes. The orientation of... 110
Figure 4-2. Overview of SHERLOG algorithm. Given the pairwise alignments between two genomes, the alignments shorter than a resolution are eliminated in the alignment processing step. The cleaned alignments are used to construct a directed acyclic graph (DAG)... 113
Figure 4-3. Example alignment pattern and comparison of the number of homologous genes between human and chimpanzee. a Grey colored ribbons indicate pairwise... 119
Figure 4-4. Comparison of consistency with the chromosome painting results of cat and dog genome. a Homologous dog chromosomes (thin lines with different colors) which... 122
Figure 4-5. Homology detection between the human reference genome assembly (hg38) and a phased human genome assembly (HG00733) by three different programs. a Red... 124
Figure 5-1. Overview of myOmicsPortal for comparative multi-omics analysis across multiple genomes, which consists of a command line interface and a web interface. 133
Figure 5-2. Diverse functions of myOmicsPortal for visualizing complex and large-scale data. a Two different panel arrangements according to different genome layouts in the... 138
Figure 5-3. Comparison between coordinate matching results of myOmicsPortal and liftOver program. Pie charts represent concordance rate between two program results.... 141
Figure 5-4. Results of synteny analysis (a) and genomic coordinate matching analysis (b-f) between human (hg38) and chimpanzee genome (panTro5) using myOmicsPortal.... 142
Figure 5-5. Results of synteny analysis (a) and genomic coordinate matching analysis (b-f) between human (hg38) and rhesus monkey genome (rheMac8) using... 143
Figure 5-6. Results of synteny analysis (a) and genomic coordinate matching analysis (b-f) between human (hg38) and mouse genome (mm10) using myOmicsPortal (Same... 144
Figure 5-7. Application examples of myOmicsPortal for inter-species comparative analysis. For the lollipop chart in overview panel and boxes under the charts, different... 145
Figure 5-8. Comparisons of normalized counts between human and mouse immune cells for gene expression data (a, b) and H3K27ac profile data (c). Yellow-colored points... 147
Figure 5-9. Application examples of myOmicsPortal using multi-omics data of human (hg38), chimpanzee (panTro5) and rhesus monkey (rheMac8). a Overview of synteny... 148
Figure 5-10. Comparisons of normalized counts of H3K27ac profile data among primates in Glu neuron (a) and MGE-GABA neuron (b). Each scatter plot was drawn... 149
Figure 5-11. Application examples of myOmicsPortal for the intra-species comparative analysis. The tracks respectively display the omics data of different types and genomes,... 150