Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait

Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait PDF Author: Gleb Kichaev
Publisher:
ISBN:
Category :
Languages : en
Pages : 166

Get Book Here

Book Description
The Genome-wide Association study (GWAS) is one of the primary tools for understanding the genetic basis of complex traits. In this dissertation I introduce enhanced statistical methods to do integrative GWAS analysis with functional genomic data. First, I describe an integrative fine-mapping framework to prioritize causal variants at known GWAS risk loci. Next, I expand upon this framework to exploit genetic heterogeniety across human populations to improve statistical efficiency. I then consider a new inference strategy to reduce the computational burden of the methodology. Finally, I propose a new approach for GWAS discovery that leverages functional genomic data through polygenic modeling.

Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait

Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait PDF Author: Gleb Kichaev
Publisher:
ISBN:
Category :
Languages : en
Pages : 166

Get Book Here

Book Description
The Genome-wide Association study (GWAS) is one of the primary tools for understanding the genetic basis of complex traits. In this dissertation I introduce enhanced statistical methods to do integrative GWAS analysis with functional genomic data. First, I describe an integrative fine-mapping framework to prioritize causal variants at known GWAS risk loci. Next, I expand upon this framework to exploit genetic heterogeniety across human populations to improve statistical efficiency. I then consider a new inference strategy to reduce the computational burden of the methodology. Finally, I propose a new approach for GWAS discovery that leverages functional genomic data through polygenic modeling.

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits PDF Author: Ruowang Li
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
With the arrival of big data in genetics in the past decade, the field has experienced drastic changes. One game-changing breakthrough in genetics was the invention of genotyping and sequencing technology that allows researchers to examining single nucleotide polymorphisms (SNPs) across the entire genome. The other major breakthrough was the identification of haplotypes of common alleles in major human populations, which permitted the design of genotyping assays that effectively cover entire human genomes at a resolution appropriate for genetic mapping. Together, these technology breakthroughs have permitted researchers to carry out Genome Wide Association Studies (GWAS) on a wide range of traits including, for example, height and disease status. With GWAS, causal SNPs have been identified for some Mendelian traits, but for more complex genetic traits, the genetic heritability explained by the associated SNPs are low. In addition, high-throughput technologies to generate other types of -omics data such as gene expression, DNA methylation, and protein levels data have also emerged recently. How to best utilize the SNP data and other multi-omics data to understand genetic traits is one of the most important questions in the field today. With the increasing prevalence of multi-omics data, new types of analysis schemes and tools are needed to handle the additional complexity of the data. In particular, two areas of method development are in great need. First, statistical methods employed by GWAS do not consider the potential interacting relationships among genetic loci. Thus, methods that can explore the joint effect between multiple genetic loci or genetic factors could unveil new associations. Second, different types of omics data may give distinctive representations of the overall biological system. By combining multi-omics data, we could potentially aggregate non-overlapping information from each individual data types. Thus, the focus of this dissertation is on developing and improving computational methods that can jointly model multiple types of genomics data. First, an evaluation of an existing method, grammatical evolution neural network, was conducted to identify the optimal algorithm settings for the detection of genetic associations. It was found that under certain algorithm settings, the neural networks have been restricted to one-layer simple network. Using a parameter sweep approach, the analysis identified optimal settings that allow for building more flexible network structures. Then, the algorithm was applied to integrate multi-omics data to model drug-induced cytotoxicity for a number of cancer drugs. By combining different types of omics data including SNPs, gene expression and methylation levels, we were able to model a higher portion of the observed variability than any individual data type alone. However, one drawback of the existing neural network approach is the limited interpretability. To this end, a new algorithm based on Bayesian Networks was created. One novelty of the approach is the ability to independently fit a distinct Bayesian Network for each categories of a phenotype. This allows for identifying category specific interactions as well as common interactions across different categories. Analysis using simulated SNP data has shown that the Bayesian Network approach outperformed the Neural Network approach in many settings, particularly in situation where the data contains multiple interacting loci. When applied to a type 2 diabetes dataset, the algorithm was able to identify distinctive interaction patterns between cases and controls. Ultimately, the goal of this dissertation has been to fully take advantage of the newly available data to understand the genetic basis of complex traits.

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits PDF Author: Huwenbo Shi
Publisher:
ISBN:
Category :
Languages : en
Pages : 163

Get Book Here

Book Description
Large-scale genome-wide association studies (GWAS) have produced a rich resource of genetic data over the past decade, urging the need to develop computational and statistical methods that analyze these data. This dissertation presents four statistical methods that model the correlation structure between genetic variants and its effect on GWAS summary association statistics to help understand the genetic basis of complex human traits and diseases. The first method employs the multivariate Bernoulli distribution to model haplotype data, allowing for higher-order interactions among genetic variants, and shows better accuracy in predicting DNase I hypersensitivity status. The second method partitions heritability into small regions on the genome using GWAS summary statistics data, while accounting for complex correlation structures among genetic variants, and uncovers the genetic architectures of complex human traits and diseases. Extending the second method into pairs of traits, the third method partitions genetic correlation into small genomic regions using GWAS summary statistics data, and provides insights into the shared genetic basis between pairs of traits. Finally, the fourth method dissects population-specific and shared causal genetic variants of complex traits in two continental populations, using GWAS summary statistics data obtained from samples of different ethnicities, and reveals differences in genetic architectures of two continental populations.

Statistical Methods to Understand the Genetic Architecture of Complex Traits

Statistical Methods to Understand the Genetic Architecture of Complex Traits PDF Author: Farhad Hormozdiari
Publisher:
ISBN:
Category :
Languages : en
Pages : 239

Get Book Here

Book Description
Genome-wide association studies (GWAS) have successfully identified thousands of risk loci for complex traits. Identifying these variants requires annotating all possible variations between any two individuals, followed by detecting the variants that affect the disease status or traits. High-throughput sequencing (HTS) advancements have made it possible to sequence cohort of individuals in an efficient manner both in term of cost and time. However, HTS technologies have raised many computational challenges. I first propose an efficient method to recover dense genotype data by leveraging low sequencing and imputation techniques. Then, I introduce a novel statistical method (CNVeM) to identify Copy-number variations (CNVs) loci using HTS data. CNVeM was the first method that incorporates multi-mapped reads, which are discarded by all existing methods. Unfortunately, among all GWAS variants only a handful of them have been successfully validated to be biologically causal variants. Identifying causal variants can aid us to understand the biological mechanism of traits or diseases. However, detecting the causal variants is challenging due to linkage disequilibrium (LD) and the fact that some loci contain more than one causal variant. In my thesis, I will introduce CAVIAR (CAusal Variants Identification in Associated Regions) that is a new statistical method for fine mapping. The main advantage of CAVIAR is that we predict a set of variants for each locus that will contain all of the true causal variants with a high confidence level (e.g. 95%) even when the locus contains multiple causal variants. Next, I aim to understand the underlying mechanism of GWAS risk loci. A standard approach to uncover the mechanism of GWAS risk loci is to integrate results of GWAS and expression quantitative trait loci (eQTL) studies; we attempt to identify whether or not a significant GWAS variant also influences expression at a nearby gene in a specific tissue. However, detecting the same variant being causal in both GWAS and eQTL is challenging due to complex LD structure. I will introduce eCAVIAR (eQTL and GWAS CAusal Variants Identification in Associated Regions), a statistical method to compute the probability that the same variant is responsible for both the GWAS and eQTL signal, while accounting for complex LD structure. We integrate Glucose and Insulin-related traits meta-analysis with GTEx to detect the target genes and the most relevant tissues. Interestingly, we observe that most loci do not colocalize between GWAS and eQTL. Lastly, I propose an approach called phenotype imputation that allows one to perform GWAS on a phenotype that is difficult to collect. In our approach, we leverage the correlation structure between multiple phenotypes to impute the uncollected phenotype. I demonstrate that we can analytically calculate the statistical power of association test using imputed phenotype, which can be helpful for study design purposes

Systems Genetics

Systems Genetics PDF Author: Florian Markowetz
Publisher: Cambridge University Press
ISBN: 131638098X
Category : Science
Languages : en
Pages : 287

Get Book Here

Book Description
Whereas genetic studies have traditionally focused on explaining heritance of single traits and their phenotypes, recent technological advances have made it possible to comprehensively dissect the genetic architecture of complex traits and quantify how genes interact to shape phenotypes. This exciting new area has been termed systems genetics and is born out of a synthesis of multiple fields, integrating a range of approaches and exploiting our increased ability to obtain quantitative and detailed measurements on a broad spectrum of phenotypes. Gathering the contributions of leading scientists, both computational and experimental, this book shows how experimental perturbations can help us to understand the link between genotype and phenotype. A snapshot of current research activity and state-of-the-art approaches to systems genetics are provided, including work from model organisms such as Saccharomyces cerevisiae and Drosophila melanogaster, as well as from human studies.

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data PDF Author: Tanya Ngoc Phung
Publisher:
ISBN:
Category :
Languages : en
Pages : 213

Get Book Here

Book Description
Understanding how different evolutionary processes shape genetic variation within and between species is an important question in population genetics. The advent of next generation sequencing has allowed for many theories and hypotheses to be tested explicitly with data. However, questions such as what evolutionary processes affect neutral divergence (DNA differences between species) or genetic variation in different regions of the genome (such as on autosomes versus sex chromosomes) or how many genetic variants contribute to complex traits are still outstanding. In this dissertation, I utilized different large-scale genomic datasets and developed statistical methods to determine the role of natural selection on genetic variation between species, sex-biased evolutionary processes on shaping patterns of genetic variation on the X chromosome and autosomes, and how population history, mutation, and natural selection interact to control complex traits. First, I used genome-wide divergence data between multiple pairs of species ranging in divergence time to show that natural selection has reduced divergence at neutral sites that are linked to those under direct selection. To determine explicitly whether and to what extent linked selection and/or mutagenic recombination could account for the pattern of neutral divergence across the genome, I developed a statistical method and applied it to human-chimp neutral divergence dataset. I showed that a model including both linked selection and mutagenic recombination resulted in the best fit to the empirical data. However, the signal of mutagenic recombination could be coming from biased gene conversion. Comparing genetic diversity between the X chromosome and the autosomes could provide insights into whether and how sex-biased processes have affected genetic variation between different genomic regions. For example, X/A diversity ratio greater than neutral expectation could be due to more X chromosomes than expected and could be a result of mating practices such as polygamy where there are more reproducing females than males. I next utilized whole-genome sequences from dogs and wolves and found that X/A diversity is lower than neutral expectation in both dogs and wolves in ancient time-scales, arguing for evolutionary processes resulting in more males reproducing compared to females. However, within breed dogs, patterns of population differentiation suggest that there have been more reproducing females, highlighting effects from breeding practices such as popular sire effect where one male can father many offspring with multiple females. In medical genetics, a complete understanding of the genetic architecture is essential to unravel the genetic basis of complex traits. While genome wide association studies (GWAS) have discovered thousands of trait-associated variants and thus have furthered our understanding of the genetic architecture, key parameters such as the number of causal variants and the mutational target size are still under-studied. Further, the role of natural selection in shaping the genetic architecture is still not entirely understood. In the last chapter, I developed a computational method called InGeAr to infer the mutational target size and explore the role of natural selection on affecting the variant's effect on the trait. I found that the mutational target size differs from trait to trait and can be large, up to tens of megabases. In addition, purifying selection is coupled with the variant's effect on the trait. I discussed how these results support the omnigenic model of complex traits. In summary, in this dissertation, I utilized different types of large genomic dataset, from genome-wide divergence data to whole genome sequence data to GWAS data to develop models and statistical methods to study how different evolutionary processes have shaped patterns of genetic variation across the genome.

Methods for the Quantitative Characterization of the Genetic Basis of Human Complex Traits

Methods for the Quantitative Characterization of the Genetic Basis of Human Complex Traits PDF Author: Kathryn Burch
Publisher:
ISBN:
Category :
Languages : en
Pages : 128

Get Book Here

Book Description
A major finding from the last decade of genome-wide association studies (GWAS) is that variant-phenotype associations are significantly enriched in noncoding regulatory regions of the genome. This result suggests that GWAS associations localize variants that modulate phenotype via gene regulation as opposed to alterations in protein structure/function. However, for most complex traits, most aspects of genetic architecture-the number of causal variants/genes for a trait and the degree to which causal effect sizes are coupled with genomic features such as minor allele frequency (MAF) and linkage disequilibrium (LD)-remain actively debated. In this dissertation, I introduce three new methods to explore and quantitatively characterize complex-trait genetic architecture. First, I derive an unbiased estimator of genome-wide SNP-heritability under a very general random effects model that makes minimal assumptions on the underlying (unknown) genetic architecture of the trait. Second, I introduce a method for estimating the number of causal variants that are shared between two ancestral populations for a given trait, and I discuss the implications of the method and real-data results for improving polygenic risk prediction in ethnic minority populations. Third, I propose methods for partitioning the heritability of individual genes by MAF to identify disease-relevant genes, with the hypothesis that some disease-relevant genes may have relatively large heritability contributions from rare and low-frequency variants while still having low total gene-level heritability.

Computational Genetic Approaches for Understanding the Genetic Basis of Complex Traits

Computational Genetic Approaches for Understanding the Genetic Basis of Complex Traits PDF Author: Eun Yong Kang
Publisher:
ISBN:
Category :
Languages : en
Pages : 273

Get Book Here

Book Description
Recent advances in genotyping and sequencing technology have enabled researchers to collect an enormous amount of high-dimensional genotype data. These large scale genomic data provide unprecedented opportunity for researchers to study and analyze the genetic factors of human complex traits. One of the major challenges in analyzing these high-throughput genomic data is requirements for effective and efficient computational methodologies. In this thesis, I introduce several methodologies for analyzing these genomic data which facilitates our understanding of the genetic basis of complex human traits. First, I introduce a method for inferring biological networks from high-throughput data containing both genetic variation information and gene expression profiles from genetically distinct strains of an organism. For this problem, I use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. In particular, I utilize prior biological knowledge that genetic variations affect gene expressions, but not vice versa, which allow us to direct the subsequent edges between two gene expression levels. The prediction of a presence of causal relationship as well as the absence of causal relationship between gene expressions can facilitate distinguishing between direct and indirect effects of variation on gene expression levels. I demonstrate the utility of our approach by applying it to data set containing 112 yeast strains and the proposed method identifies the known "regulatory hotspot" in yeast. Second, I introduce efficient pairwise identity by descent (IBD) association mapping method, which utilizes importance sampling to improve efficiency and enables approximation of extremely small p-values. Two individuals are IBD at a locus if they have identical alleles inherited from a common ancestor. One popular approach to find the association between IBD status and disease phenotype is the pairwise method where one compares the IBD rate of case/case pairs to the background IBD rate to detect excessive IBD sharing between cases. One challenge of the pairwise method is computational efficiency. In the pairwise method, one uses permutation to approximate p-values because it is difficult to analytically obtain the asymptotic distribution of the statistic. Since the p-value threshold for genome-wide association studies (GWAS) is necessarily low due to multiple testing, one must perform a large number of permutations which can be computationally demanding. I present Fast-Pairwise to overcome the computational challenges of the traditional pairwise method by utilizing importance sampling to improve efficiency and enable approximation of extremely small p-values. Using the WTCCC type 1 diabetes data, I show that Fast-Pairwise can successfully pinpoint a gene known to be associated to the disease within the MHC region. Finally, I introduce a novel meta analytic approach to identify gene-by-environment interactions by aggregating the multiple studies with varying environmental conditions. Identifying environmentally specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but, under varying environmental conditions. These studies when examined in aggregate provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. In this project, I jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. I apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which show significant evidence of involvement in gene-by-environment interactions.

Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources

Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources PDF Author: Yixuan Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 151

Get Book Here

Book Description
Characterization of genetic variation and dissection of genetic architectures of complex diseases is critical in understanding their intrinsic mechanisms. Haplotype methods have shown improved power and more consistent results comparing to single-locus based approaches. We propose a new haplotype-based association method for family data. Our approach (termed F_HapMiner) first infers diplotype pairs of each individual in each pedigree assuming no recombination within a family. A phenotype score is then defined for each founder haplotype. Finally, F_HapMiner applies a clustering algorithm on founder haplotypes based on their similarities and identifies haplotype clusters that show significant associations with diseases/traits. Comparisons with single-locus and haplotype-based Transmission Disequilibrium Test (TDT) methods demonstrate that our approach consistently outperforms the TDT-based approaches regardless of disease models, local Linkage Disequilibrium (LD) structures or allele/haplotype frequencies. Traditional linkage analysis and association study may result in hundreds of candidate genes. We propose an expandable framework for gene prioritization that can integrate multiple heterogenous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using the diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis using three data sources based on protein interactions, gene expressions and pathway information. Results have shown that our approach consistently outperforms other two state-of-the-art programs. A web tool has been implemented to assist scientists in their genetic studies. Researchers commonly rely on simulated data to evaluate their approaches for detecting high-order interactions in disease gene mapping. A publicly available simulation program is of great interests. We have developed a computer program gs to quickly generate a large number of samples based on real data. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local LD patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. The improved version of gs provides great functionalities and flexibilities to simulate various interaction models. Data generated can serve as a common ground to compare different approaches in detecting interactions.

Medical Epigenetics

Medical Epigenetics PDF Author: Trygve Tollefsbol
Publisher: Academic Press
ISBN: 0128032405
Category : Science
Languages : en
Pages : 944

Get Book Here

Book Description
Medical Epigenetics provides a comprehensive analysis of the importance of epigenetics to health management. The purpose of this book is to fill a current need for a comprehensive volume on the medical aspects of epigenetics with a focus on human systems, epigenetic diseases that affect these systems and modes of treating epigenetic-based disorders and diseases. The intent of this book is to provide a stand-alone comprehensive volume that will cover all human systems relevant to epigenetic maladies and all major aspects of medical epigenetics. The overall goal is to provide the leading book on medical epigenetics that will be useful not only to physicians, nurses, medical students and many others directly involved with health care, but also investigators in life sciences, biotech companies, graduate students and many others who are interested in more applied aspects of epigenetics. Research in the area of translational epigenetics is a cornerstone of this volume. Critical reviews dedicated to the burgeoning role of epigenetics in medical practice Coverage of emerging topics including twin epigenetics as well as epigenetics of gastrointestinal disease, muscle disorders, endocrine disorders, ocular medicine, pediatric diseases, sports medicine, noncoding RNA therapeutics, pain management and regenerative medicine Encompasses a disease-oriented perspective of medical epigenetics as well as diagnostic and prognostic epigenetic approaches to applied medicine