Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources

Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources PDF Author: Yixuan Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 151

Get Book Here

Book Description
Characterization of genetic variation and dissection of genetic architectures of complex diseases is critical in understanding their intrinsic mechanisms. Haplotype methods have shown improved power and more consistent results comparing to single-locus based approaches. We propose a new haplotype-based association method for family data. Our approach (termed F_HapMiner) first infers diplotype pairs of each individual in each pedigree assuming no recombination within a family. A phenotype score is then defined for each founder haplotype. Finally, F_HapMiner applies a clustering algorithm on founder haplotypes based on their similarities and identifies haplotype clusters that show significant associations with diseases/traits. Comparisons with single-locus and haplotype-based Transmission Disequilibrium Test (TDT) methods demonstrate that our approach consistently outperforms the TDT-based approaches regardless of disease models, local Linkage Disequilibrium (LD) structures or allele/haplotype frequencies. Traditional linkage analysis and association study may result in hundreds of candidate genes. We propose an expandable framework for gene prioritization that can integrate multiple heterogenous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using the diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis using three data sources based on protein interactions, gene expressions and pathway information. Results have shown that our approach consistently outperforms other two state-of-the-art programs. A web tool has been implemented to assist scientists in their genetic studies. Researchers commonly rely on simulated data to evaluate their approaches for detecting high-order interactions in disease gene mapping. A publicly available simulation program is of great interests. We have developed a computer program gs to quickly generate a large number of samples based on real data. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local LD patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. The improved version of gs provides great functionalities and flexibilities to simulate various interaction models. Data generated can serve as a common ground to compare different approaches in detecting interactions.

Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources

Dissecting Genetic Basis of Complex Traits by Haplotype-based Association Studies and Integrated Information from Multiple Data Sources PDF Author: Yixuan Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 151

Get Book Here

Book Description
Characterization of genetic variation and dissection of genetic architectures of complex diseases is critical in understanding their intrinsic mechanisms. Haplotype methods have shown improved power and more consistent results comparing to single-locus based approaches. We propose a new haplotype-based association method for family data. Our approach (termed F_HapMiner) first infers diplotype pairs of each individual in each pedigree assuming no recombination within a family. A phenotype score is then defined for each founder haplotype. Finally, F_HapMiner applies a clustering algorithm on founder haplotypes based on their similarities and identifies haplotype clusters that show significant associations with diseases/traits. Comparisons with single-locus and haplotype-based Transmission Disequilibrium Test (TDT) methods demonstrate that our approach consistently outperforms the TDT-based approaches regardless of disease models, local Linkage Disequilibrium (LD) structures or allele/haplotype frequencies. Traditional linkage analysis and association study may result in hundreds of candidate genes. We propose an expandable framework for gene prioritization that can integrate multiple heterogenous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using the diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis using three data sources based on protein interactions, gene expressions and pathway information. Results have shown that our approach consistently outperforms other two state-of-the-art programs. A web tool has been implemented to assist scientists in their genetic studies. Researchers commonly rely on simulated data to evaluate their approaches for detecting high-order interactions in disease gene mapping. A publicly available simulation program is of great interests. We have developed a computer program gs to quickly generate a large number of samples based on real data. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local LD patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. The improved version of gs provides great functionalities and flexibilities to simulate various interaction models. Data generated can serve as a common ground to compare different approaches in detecting interactions.

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits

Computational Methods to Analyze Large-scale Genetic Studies of Complex Human Traits PDF Author: Huwenbo Shi
Publisher:
ISBN:
Category :
Languages : en
Pages : 163

Get Book Here

Book Description
Large-scale genome-wide association studies (GWAS) have produced a rich resource of genetic data over the past decade, urging the need to develop computational and statistical methods that analyze these data. This dissertation presents four statistical methods that model the correlation structure between genetic variants and its effect on GWAS summary association statistics to help understand the genetic basis of complex human traits and diseases. The first method employs the multivariate Bernoulli distribution to model haplotype data, allowing for higher-order interactions among genetic variants, and shows better accuracy in predicting DNase I hypersensitivity status. The second method partitions heritability into small regions on the genome using GWAS summary statistics data, while accounting for complex correlation structures among genetic variants, and uncovers the genetic architectures of complex human traits and diseases. Extending the second method into pairs of traits, the third method partitions genetic correlation into small genomic regions using GWAS summary statistics data, and provides insights into the shared genetic basis between pairs of traits. Finally, the fourth method dissects population-specific and shared causal genetic variants of complex traits in two continental populations, using GWAS summary statistics data obtained from samples of different ethnicities, and reveals differences in genetic architectures of two continental populations.

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits

Using Large-scale Genomics Data to Understand the Genetic Basis of Complex Traits PDF Author: Ruowang Li
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
With the arrival of big data in genetics in the past decade, the field has experienced drastic changes. One game-changing breakthrough in genetics was the invention of genotyping and sequencing technology that allows researchers to examining single nucleotide polymorphisms (SNPs) across the entire genome. The other major breakthrough was the identification of haplotypes of common alleles in major human populations, which permitted the design of genotyping assays that effectively cover entire human genomes at a resolution appropriate for genetic mapping. Together, these technology breakthroughs have permitted researchers to carry out Genome Wide Association Studies (GWAS) on a wide range of traits including, for example, height and disease status. With GWAS, causal SNPs have been identified for some Mendelian traits, but for more complex genetic traits, the genetic heritability explained by the associated SNPs are low. In addition, high-throughput technologies to generate other types of -omics data such as gene expression, DNA methylation, and protein levels data have also emerged recently. How to best utilize the SNP data and other multi-omics data to understand genetic traits is one of the most important questions in the field today. With the increasing prevalence of multi-omics data, new types of analysis schemes and tools are needed to handle the additional complexity of the data. In particular, two areas of method development are in great need. First, statistical methods employed by GWAS do not consider the potential interacting relationships among genetic loci. Thus, methods that can explore the joint effect between multiple genetic loci or genetic factors could unveil new associations. Second, different types of omics data may give distinctive representations of the overall biological system. By combining multi-omics data, we could potentially aggregate non-overlapping information from each individual data types. Thus, the focus of this dissertation is on developing and improving computational methods that can jointly model multiple types of genomics data. First, an evaluation of an existing method, grammatical evolution neural network, was conducted to identify the optimal algorithm settings for the detection of genetic associations. It was found that under certain algorithm settings, the neural networks have been restricted to one-layer simple network. Using a parameter sweep approach, the analysis identified optimal settings that allow for building more flexible network structures. Then, the algorithm was applied to integrate multi-omics data to model drug-induced cytotoxicity for a number of cancer drugs. By combining different types of omics data including SNPs, gene expression and methylation levels, we were able to model a higher portion of the observed variability than any individual data type alone. However, one drawback of the existing neural network approach is the limited interpretability. To this end, a new algorithm based on Bayesian Networks was created. One novelty of the approach is the ability to independently fit a distinct Bayesian Network for each categories of a phenotype. This allows for identifying category specific interactions as well as common interactions across different categories. Analysis using simulated SNP data has shown that the Bayesian Network approach outperformed the Neural Network approach in many settings, particularly in situation where the data contains multiple interacting loci. When applied to a type 2 diabetes dataset, the algorithm was able to identify distinctive interaction patterns between cases and controls. Ultimately, the goal of this dissertation has been to fully take advantage of the newly available data to understand the genetic basis of complex traits.

Haplotype-based Association Mapping Complements SNP-based Approaches as a Powerful Tool to Analyze the Genetic Basis of Complex Traits

Haplotype-based Association Mapping Complements SNP-based Approaches as a Powerful Tool to Analyze the Genetic Basis of Complex Traits PDF Author: Fang Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Association mapping, SNP-based GWAS, FH-based GWAS, haplotype, epistasis, leaf rust, hybrid wheat, independent validation, predictability.

Molecular Dissection of Complex Traits

Molecular Dissection of Complex Traits PDF Author: Andrew H. Paterson
Publisher: CRC Press
ISBN: 0429525419
Category : Science
Languages : en
Pages : 413

Get Book Here

Book Description
In the past 10 years, contemporary geneticists using new molecular tools have been able to resolve complex traits into individual genetic components and describe each such component in detail. Molecular Dissection of Complex Traits summarizes the state of the art in molecular analysis of complex traits (QTL mapping), placing new developments in thi

Systems Genetics

Systems Genetics PDF Author: Florian Markowetz
Publisher: Cambridge University Press
ISBN: 131638098X
Category : Science
Languages : en
Pages : 287

Get Book Here

Book Description
Whereas genetic studies have traditionally focused on explaining heritance of single traits and their phenotypes, recent technological advances have made it possible to comprehensively dissect the genetic architecture of complex traits and quantify how genes interact to shape phenotypes. This exciting new area has been termed systems genetics and is born out of a synthesis of multiple fields, integrating a range of approaches and exploiting our increased ability to obtain quantitative and detailed measurements on a broad spectrum of phenotypes. Gathering the contributions of leading scientists, both computational and experimental, this book shows how experimental perturbations can help us to understand the link between genotype and phenotype. A snapshot of current research activity and state-of-the-art approaches to systems genetics are provided, including work from model organisms such as Saccharomyces cerevisiae and Drosophila melanogaster, as well as from human studies.

Statistical Methods for Genetic Association Mapping of Complex Traits with Related Individuals

Statistical Methods for Genetic Association Mapping of Complex Traits with Related Individuals PDF Author: Zuoheng Wang
Publisher:
ISBN: 9781109314212
Category :
Languages : en
Pages : 88

Get Book Here

Book Description
We develop statistical methods to address both dependent and partially-observed data and apply these methods to problems in haplotype-based association analysis of complex traits in related individuals. We consider a general setting in which the complete data are dependent with marginal distributions following a generalized linear model. We form a vector Z whose elements are conditional expectations of the elements of the complete-data vector, given selected functions of the incomplete data. Assuming that the covariance matrix of Z is available, we form an optimal linear estimating function based on Z, which we solve by an iterative method. This approach allows us to address key difficulties in the haplotype frequency estimation and testing problems in related individuals: (1) dependence that is known but can be complicated; (2) data that are incomplete for structural reasons, as well as possibly missing, with different amounts of information for different observations; (3) the need for computational speed in order to analyze large numbers of markers; (4) a well-established null model, but an alternative model that is unknown and is problematic to fully specify in related individuals. We apply the method to test for association of haplotypes with alcoholism in the GAW 14 COGA data set.

Genetic Dissection of Complex Traits

Genetic Dissection of Complex Traits PDF Author: D.C. Rao
Publisher: Academic Press
ISBN: 0080569110
Category : Medical
Languages : en
Pages : 788

Get Book Here

Book Description
The field of genetics is rapidly evolving and new medical breakthroughs are occuring as a result of advances in knowledge of genetics. This series continually publishes important reviews of the broadest interest to geneticists and their colleagues in affiliated disciplines. Five sections on the latest advances in complex traits Methods for testing with ethical, legal, and social implications Hot topics include discussions on systems biology approach to drug discovery; using comparative genomics for detecting human disease genes; computationally intensive challenges, and more

Next Generation Genome-wide Association Studies in Complex Human Quantitative Traits

Next Generation Genome-wide Association Studies in Complex Human Quantitative Traits PDF Author: Andrew Robert Wood
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Since 2005, genome-wide association (GWA) studies have dominated the field of complex traits. Genetic and environmental factors play a role in causing disease and influencing the variance of a quantitative trait. GWA is a hypothesis-free approach that follows on from candidate gene and linkage studies and has markedly increased the number of loci associated with complex traits. Despite the relative success of GWA studies in identifying several hundreds of phenotypic associations, the genetic component of most complex traits remains largely unaccounted for. The field has now begun to focus its, efforts on the "missing heritability" to enhance the understanding of genetics and the associated biological pathways that underlie the aetiology of complex phenotypes. This thesis presents a series of studies that attempt to address this issue by exploring other sources of variation and statistical models that have not been extensively addressed in GWA studies to date. Chapter 1 is an introduction to genome-wide association studies. In particular it describes the origins of these studies, what we have learnt from them as well as their limitations. Chapter 2 describes a study that shows how multiple signals within a single locus can explain more of the genetic component of a complex trait, using gene expression as a model trait. 2 Chapter 3 describes a study that tests for deviation from additivity (additivity is an assumption of most GWA studies to date) through dominant, recessive and gene-gene interaction analyses using height, body mass index, and waist-hip ratio (adjusted for BMI) as model phenotypes. Chapter 4 describes a study that examines how more signals may be identified by increasing the density of variants through 1000 Genomes based imputation compared to HapMap based imputation. I use 93 phenotypes, all circulating factors, including proteins, ions and vitamins. Chapter 5 describes a study that tests whether more association signals can be discovered through low-coverage whole-genome sequencing. In particular, I compare association testing based on 1000 Genomes based imputation and sequencing. I use gene expression as a model trait. Chapter 6 discusses the research findings from the previous chapters, presents conclusions, and describes future research plans in the field of complex traits for a fuller understanding of the role of genetics. 3.

Genetics and Genomics of Brachypodium

Genetics and Genomics of Brachypodium PDF Author: John P. Vogel
Publisher: Springer
ISBN: 3319269445
Category : Science
Languages : en
Pages : 354

Get Book Here

Book Description
Grasses dominate many natural ecosystems and produce the bulk calories consumed by humans either directly in the form of grains or indirectly through forage/grain fed animals. In addition, grasses grown as biomass crops are poised to become a significant source of renewable energy. Despite their economic and environmental importance, research into the unique aspects of grass biology has been hampered by the lack of a truly tractable experimental model system. Over that past decade, the small, annual grass Brachypodium distachyon has emerged as a viable model system for the grasses. This book describes the development of extensive experimental resources (e.g. whole genome sequence, efficient transformation methods, insertional mutant collections, large germplasm collections, recombinant inbred lines, resequenced genomes) that have led many laboratories around the world to adopt B. distachyon as a model system. The use of B. distachyon to address a wide range of biological topics (e.g. disease resistance, cell wall composition, abiotic stress tolerance, root growth and development, floral development, natural diversity) is also discussed.