Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits PDF Author: Ruowang Li
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
With the arrival of big data in genetics in the past decade, the field has experienced drastic changes. One game-changing breakthrough in genetics was the invention of genotyping and sequencing technology that allows researchers to examining single nucleotide polymorphisms (SNPs) across the entire genome. The other major breakthrough was the identification of haplotypes of common alleles in major human populations, which permitted the design of genotyping assays that effectively cover entire human genomes at a resolution appropriate for genetic mapping. Together, these technology breakthroughs have permitted researchers to carry out Genome Wide Association Studies (GWAS) on a wide range of traits including, for example, height and disease status. With GWAS, causal SNPs have been identified for some Mendelian traits, but for more complex genetic traits, the genetic heritability explained by the associated SNPs are low. In addition, high-throughput technologies to generate other types of -omics data such as gene expression, DNA methylation, and protein levels data have also emerged recently. How to best utilize the SNP data and other multi-omics data to understand genetic traits is one of the most important questions in the field today. With the increasing prevalence of multi-omics data, new types of analysis schemes and tools are needed to handle the additional complexity of the data. In particular, two areas of method development are in great need. First, statistical methods employed by GWAS do not consider the potential interacting relationships among genetic loci. Thus, methods that can explore the joint effect between multiple genetic loci or genetic factors could unveil new associations. Second, different types of omics data may give distinctive representations of the overall biological system. By combining multi-omics data, we could potentially aggregate non-overlapping information from each individual data types. Thus, the focus of this dissertation is on developing and improving computational methods that can jointly model multiple types of genomics data. First, an evaluation of an existing method, grammatical evolution neural network, was conducted to identify the optimal algorithm settings for the detection of genetic associations. It was found that under certain algorithm settings, the neural networks have been restricted to one-layer simple network. Using a parameter sweep approach, the analysis identified optimal settings that allow for building more flexible network structures. Then, the algorithm was applied to integrate multi-omics data to model drug-induced cytotoxicity for a number of cancer drugs. By combining different types of omics data including SNPs, gene expression and methylation levels, we were able to model a higher portion of the observed variability than any individual data type alone. However, one drawback of the existing neural network approach is the limited interpretability. To this end, a new algorithm based on Bayesian Networks was created. One novelty of the approach is the ability to independently fit a distinct Bayesian Network for each categories of a phenotype. This allows for identifying category specific interactions as well as common interactions across different categories. Analysis using simulated SNP data has shown that the Bayesian Network approach outperformed the Neural Network approach in many settings, particularly in situation where the data contains multiple interacting loci. When applied to a type 2 diabetes dataset, the algorithm was able to identify distinctive interaction patterns between cases and controls. Ultimately, the goal of this dissertation has been to fully take advantage of the newly available data to understand the genetic basis of complex traits.

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits PDF Author: Ruowang Li
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
With the arrival of big data in genetics in the past decade, the field has experienced drastic changes. One game-changing breakthrough in genetics was the invention of genotyping and sequencing technology that allows researchers to examining single nucleotide polymorphisms (SNPs) across the entire genome. The other major breakthrough was the identification of haplotypes of common alleles in major human populations, which permitted the design of genotyping assays that effectively cover entire human genomes at a resolution appropriate for genetic mapping. Together, these technology breakthroughs have permitted researchers to carry out Genome Wide Association Studies (GWAS) on a wide range of traits including, for example, height and disease status. With GWAS, causal SNPs have been identified for some Mendelian traits, but for more complex genetic traits, the genetic heritability explained by the associated SNPs are low. In addition, high-throughput technologies to generate other types of -omics data such as gene expression, DNA methylation, and protein levels data have also emerged recently. How to best utilize the SNP data and other multi-omics data to understand genetic traits is one of the most important questions in the field today. With the increasing prevalence of multi-omics data, new types of analysis schemes and tools are needed to handle the additional complexity of the data. In particular, two areas of method development are in great need. First, statistical methods employed by GWAS do not consider the potential interacting relationships among genetic loci. Thus, methods that can explore the joint effect between multiple genetic loci or genetic factors could unveil new associations. Second, different types of omics data may give distinctive representations of the overall biological system. By combining multi-omics data, we could potentially aggregate non-overlapping information from each individual data types. Thus, the focus of this dissertation is on developing and improving computational methods that can jointly model multiple types of genomics data. First, an evaluation of an existing method, grammatical evolution neural network, was conducted to identify the optimal algorithm settings for the detection of genetic associations. It was found that under certain algorithm settings, the neural networks have been restricted to one-layer simple network. Using a parameter sweep approach, the analysis identified optimal settings that allow for building more flexible network structures. Then, the algorithm was applied to integrate multi-omics data to model drug-induced cytotoxicity for a number of cancer drugs. By combining different types of omics data including SNPs, gene expression and methylation levels, we were able to model a higher portion of the observed variability than any individual data type alone. However, one drawback of the existing neural network approach is the limited interpretability. To this end, a new algorithm based on Bayesian Networks was created. One novelty of the approach is the ability to independently fit a distinct Bayesian Network for each categories of a phenotype. This allows for identifying category specific interactions as well as common interactions across different categories. Analysis using simulated SNP data has shown that the Bayesian Network approach outperformed the Neural Network approach in many settings, particularly in situation where the data contains multiple interacting loci. When applied to a type 2 diabetes dataset, the algorithm was able to identify distinctive interaction patterns between cases and controls. Ultimately, the goal of this dissertation has been to fully take advantage of the newly available data to understand the genetic basis of complex traits.

Computational Genetic Approaches for Understanding the Genetic Basis of Complex Traits

Computational Genetic Approaches for Understanding the Genetic Basis of Complex Traits PDF Author: Eun Yong Kang
Publisher:
ISBN:
Category :
Languages : en
Pages : 273

Get Book Here

Book Description
Recent advances in genotyping and sequencing technology have enabled researchers to collect an enormous amount of high-dimensional genotype data. These large scale genomic data provide unprecedented opportunity for researchers to study and analyze the genetic factors of human complex traits. One of the major challenges in analyzing these high-throughput genomic data is requirements for effective and efficient computational methodologies. In this thesis, I introduce several methodologies for analyzing these genomic data which facilitates our understanding of the genetic basis of complex human traits. First, I introduce a method for inferring biological networks from high-throughput data containing both genetic variation information and gene expression profiles from genetically distinct strains of an organism. For this problem, I use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. In particular, I utilize prior biological knowledge that genetic variations affect gene expressions, but not vice versa, which allow us to direct the subsequent edges between two gene expression levels. The prediction of a presence of causal relationship as well as the absence of causal relationship between gene expressions can facilitate distinguishing between direct and indirect effects of variation on gene expression levels. I demonstrate the utility of our approach by applying it to data set containing 112 yeast strains and the proposed method identifies the known "regulatory hotspot" in yeast. Second, I introduce efficient pairwise identity by descent (IBD) association mapping method, which utilizes importance sampling to improve efficiency and enables approximation of extremely small p-values. Two individuals are IBD at a locus if they have identical alleles inherited from a common ancestor. One popular approach to find the association between IBD status and disease phenotype is the pairwise method where one compares the IBD rate of case/case pairs to the background IBD rate to detect excessive IBD sharing between cases. One challenge of the pairwise method is computational efficiency. In the pairwise method, one uses permutation to approximate p-values because it is difficult to analytically obtain the asymptotic distribution of the statistic. Since the p-value threshold for genome-wide association studies (GWAS) is necessarily low due to multiple testing, one must perform a large number of permutations which can be computationally demanding. I present Fast-Pairwise to overcome the computational challenges of the traditional pairwise method by utilizing importance sampling to improve efficiency and enable approximation of extremely small p-values. Using the WTCCC type 1 diabetes data, I show that Fast-Pairwise can successfully pinpoint a gene known to be associated to the disease within the MHC region. Finally, I introduce a novel meta analytic approach to identify gene-by-environment interactions by aggregating the multiple studies with varying environmental conditions. Identifying environmentally specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but, under varying environmental conditions. These studies when examined in aggregate provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. In this project, I jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. I apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which show significant evidence of involvement in gene-by-environment interactions.

Genome Mapping and Genomics in Human and Non-Human Primates

Genome Mapping and Genomics in Human and Non-Human Primates PDF Author: Ravindranath Duggirala
Publisher: Springer
ISBN: 3662463067
Category : Science
Languages : en
Pages : 305

Get Book Here

Book Description
This book provides an introduction to the latest gene mapping techniques and their applications in biomedical research and evolutionary biology. It especially highlights the advances made in large-scale genomic sequencing. Results of studies that illustrate how the new approaches have improved our understanding of the genetic basis of complex phenotypes including multifactorial diseases (e.g., cardiovascular disease, type 2 diabetes, and obesity), anatomic characteristics (e.g., the craniofacial complex), and neurological and behavioral phenotypes (e.g., human brain structure and nonhuman primate behavior) are presented. Topics covered include linkage and association methods, gene expression, copy number variation, next-generation sequencing, comparative genomics, population structure, and a discussion of the Human Genome Project. Further included are discussions of the use of statistical genetic and genetic epidemiologic techniques to decipher the genetic architecture of normal and disease-related complex phenotypes using data from both humans and non-human primates.

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits PDF Author: Huwenbo Shi
Publisher:
ISBN:
Category :
Languages : en
Pages : 163

Get Book Here

Book Description
Large-scale genome-wide association studies (GWAS) have produced a rich resource of genetic data over the past decade, urging the need to develop computational and statistical methods that analyze these data. This dissertation presents four statistical methods that model the correlation structure between genetic variants and its effect on GWAS summary association statistics to help understand the genetic basis of complex human traits and diseases. The first method employs the multivariate Bernoulli distribution to model haplotype data, allowing for higher-order interactions among genetic variants, and shows better accuracy in predicting DNase I hypersensitivity status. The second method partitions heritability into small regions on the genome using GWAS summary statistics data, while accounting for complex correlation structures among genetic variants, and uncovers the genetic architectures of complex human traits and diseases. Extending the second method into pairs of traits, the third method partitions genetic correlation into small genomic regions using GWAS summary statistics data, and provides insights into the shared genetic basis between pairs of traits. Finally, the fourth method dissects population-specific and shared causal genetic variants of complex traits in two continental populations, using GWAS summary statistics data obtained from samples of different ethnicities, and reveals differences in genetic architectures of two continental populations.

Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait

Integrative Statistical Methods to Understand the Genetic Basis of Complex Trait PDF Author: Gleb Kichaev
Publisher:
ISBN:
Category :
Languages : en
Pages : 166

Get Book Here

Book Description
The Genome-wide Association study (GWAS) is one of the primary tools for understanding the genetic basis of complex traits. In this dissertation I introduce enhanced statistical methods to do integrative GWAS analysis with functional genomic data. First, I describe an integrative fine-mapping framework to prioritize causal variants at known GWAS risk loci. Next, I expand upon this framework to exploit genetic heterogeniety across human populations to improve statistical efficiency. I then consider a new inference strategy to reduce the computational burden of the methodology. Finally, I propose a new approach for GWAS discovery that leverages functional genomic data through polygenic modeling.

Genetic Dissection of Complex Traits

Genetic Dissection of Complex Traits PDF Author: D.C. Rao
Publisher: Academic Press
ISBN: 0080569110
Category : Medical
Languages : en
Pages : 788

Get Book Here

Book Description
The field of genetics is rapidly evolving and new medical breakthroughs are occuring as a result of advances in knowledge of genetics. This series continually publishes important reviews of the broadest interest to geneticists and their colleagues in affiliated disciplines. Five sections on the latest advances in complex traits Methods for testing with ethical, legal, and social implications Hot topics include discussions on systems biology approach to drug discovery; using comparative genomics for detecting human disease genes; computationally intensive challenges, and more

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data PDF Author: Tanya Ngoc Phung
Publisher:
ISBN:
Category :
Languages : en
Pages : 213

Get Book Here

Book Description
Understanding how different evolutionary processes shape genetic variation within and between species is an important question in population genetics. The advent of next generation sequencing has allowed for many theories and hypotheses to be tested explicitly with data. However, questions such as what evolutionary processes affect neutral divergence (DNA differences between species) or genetic variation in different regions of the genome (such as on autosomes versus sex chromosomes) or how many genetic variants contribute to complex traits are still outstanding. In this dissertation, I utilized different large-scale genomic datasets and developed statistical methods to determine the role of natural selection on genetic variation between species, sex-biased evolutionary processes on shaping patterns of genetic variation on the X chromosome and autosomes, and how population history, mutation, and natural selection interact to control complex traits. First, I used genome-wide divergence data between multiple pairs of species ranging in divergence time to show that natural selection has reduced divergence at neutral sites that are linked to those under direct selection. To determine explicitly whether and to what extent linked selection and/or mutagenic recombination could account for the pattern of neutral divergence across the genome, I developed a statistical method and applied it to human-chimp neutral divergence dataset. I showed that a model including both linked selection and mutagenic recombination resulted in the best fit to the empirical data. However, the signal of mutagenic recombination could be coming from biased gene conversion. Comparing genetic diversity between the X chromosome and the autosomes could provide insights into whether and how sex-biased processes have affected genetic variation between different genomic regions. For example, X/A diversity ratio greater than neutral expectation could be due to more X chromosomes than expected and could be a result of mating practices such as polygamy where there are more reproducing females than males. I next utilized whole-genome sequences from dogs and wolves and found that X/A diversity is lower than neutral expectation in both dogs and wolves in ancient time-scales, arguing for evolutionary processes resulting in more males reproducing compared to females. However, within breed dogs, patterns of population differentiation suggest that there have been more reproducing females, highlighting effects from breeding practices such as popular sire effect where one male can father many offspring with multiple females. In medical genetics, a complete understanding of the genetic architecture is essential to unravel the genetic basis of complex traits. While genome wide association studies (GWAS) have discovered thousands of trait-associated variants and thus have furthered our understanding of the genetic architecture, key parameters such as the number of causal variants and the mutational target size are still under-studied. Further, the role of natural selection in shaping the genetic architecture is still not entirely understood. In the last chapter, I developed a computational method called InGeAr to infer the mutational target size and explore the role of natural selection on affecting the variant's effect on the trait. I found that the mutational target size differs from trait to trait and can be large, up to tens of megabases. In addition, purifying selection is coupled with the variant's effect on the trait. I discussed how these results support the omnigenic model of complex traits. In summary, in this dissertation, I utilized different types of large genomic dataset, from genome-wide divergence data to whole genome sequence data to GWAS data to develop models and statistical methods to study how different evolutionary processes have shaped patterns of genetic variation across the genome.

Scientific Frontiers in Developmental Toxicology and Risk Assessment

Scientific Frontiers in Developmental Toxicology and Risk Assessment PDF Author: National Research Council
Publisher: National Academies Press
ISBN: 0309070864
Category : Nature
Languages : en
Pages : 348

Get Book Here

Book Description
Scientific Frontiers in Developmental Toxicology and Risk Assessment reviews advances made during the last 10-15 years in fields such as developmental biology, molecular biology, and genetics. It describes a novel approach for how these advances might be used in combination with existing methodologies to further the understanding of mechanisms of developmental toxicity, to improve the assessment of chemicals for their ability to cause developmental toxicity, and to improve risk assessment for developmental defects. For example, based on the recent advances, even the smallest, simplest laboratory animals such as the fruit fly, roundworm, and zebrafish might be able to serve as developmental toxicological models for human biological systems. Use of such organisms might allow for rapid and inexpensive testing of large numbers of chemicals for their potential to cause developmental toxicity; presently, there are little or no developmental toxicity data available for the majority of natural and manufactured chemicals in use. This new approach to developmental toxicology and risk assessment will require simultaneous research on several fronts by experts from multiple scientific disciplines, including developmental toxicologists, developmental biologists, geneticists, epidemiologists, and biostatisticians.

Assessing Rare Variation in Complex Traits

Assessing Rare Variation in Complex Traits PDF Author: Eleftheria Zeggini
Publisher: Springer
ISBN: 1493928244
Category : Medical
Languages : en
Pages : 262

Get Book Here

Book Description
This book is unique in covering a wide range of design and analysis issues in genetic studies of rare variants, taking advantage of collaboration of the editors with many experts in the field through large-scale international consortia including the UK10K Project, GO-T2D and T2D-GENES. Chapters provide details of state-of-the-art methodology for rare variant detection and calling, imputation and analysis in samples of unrelated individuals and families. The book also covers analytical issues associated with the study of rare variants, such as the impact of fine-scale population structure, and with combining information on rare variants across studies in a meta-analysis framework. Genetic association studies have in the last few years substantially enhanced our understanding of factors underlying traits of high medical importance, such as body mass index, lipid levels, blood pressure and many others. There is growing empirical evidence that low-frequency and rare variants play an important role in complex human phenotypes. This book covers multiple aspects of study design, analysis and interpretation for complex trait studies focusing on rare sequence variation. In many areas of genomic research, including complex trait association studies, technology is in danger of outstripping our capacity to analyse and interpret the vast amounts of data generated. The field of statistical genetics in the whole-genome sequencing era is still in its infancy, but powerful methods to analyse the aggregation of low-frequency and rare variants are now starting to emerge. The chapter Functional Annotation of Rare Genetic Variants is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

From Agriculture Genome to Phenome: Genome-Wide Association, Prediction and Selection

From Agriculture Genome to Phenome: Genome-Wide Association, Prediction and Selection PDF Author: Kefei Chen
Publisher: Frontiers Media SA
ISBN: 2832540619
Category : Science
Languages : en
Pages : 147

Get Book Here

Book Description
The advances in “omics” technologies have enabled unprecedented progress in agricultural and biological sciences. The synergy of high-performance computing, high throughput omics approaches, and high dimensional phenotyping data with high spatial and temporal resolution have demonstrated the capacity to enhance our understanding of biological mechanisms but also to provide powerful insights into dissecting the genetic basis of complex traits with agricultural and economical importance. Genome-wide association study (GWAS) has become a useful approach to identify mutations that underlie diseases and complex traits and has provided important insights in exploring genetic profiles. However, it is less suitable for quantitative traits influenced by a large number of genes with small effects. In addition, false discoveries are a major concern and can be partially attributed to population structure. Genomic selection holds the promise to overcome the limitations by using whole-genome information to predict the genetic merits of phenotypes. It has been a powerful tool for predicting the breeding values of candidates for selection in breeding populations. One of the challenges of genomic prediction of breeding values with large-p-with-small-n regressions is to develop robust and efficient approaches that accurately predict phenotypic traits as functions of genotypic and environmental inputs. In addition, the integration of multi-omics data in phenotypic prediction would offer the opportunity to understand the flow of information that underlies the phenotypic traits.