Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data PDF Author: Tanya Ngoc Phung
Publisher:
ISBN:
Category :
Languages : en
Pages : 213

Get Book Here

Book Description
Understanding how different evolutionary processes shape genetic variation within and between species is an important question in population genetics. The advent of next generation sequencing has allowed for many theories and hypotheses to be tested explicitly with data. However, questions such as what evolutionary processes affect neutral divergence (DNA differences between species) or genetic variation in different regions of the genome (such as on autosomes versus sex chromosomes) or how many genetic variants contribute to complex traits are still outstanding. In this dissertation, I utilized different large-scale genomic datasets and developed statistical methods to determine the role of natural selection on genetic variation between species, sex-biased evolutionary processes on shaping patterns of genetic variation on the X chromosome and autosomes, and how population history, mutation, and natural selection interact to control complex traits. First, I used genome-wide divergence data between multiple pairs of species ranging in divergence time to show that natural selection has reduced divergence at neutral sites that are linked to those under direct selection. To determine explicitly whether and to what extent linked selection and/or mutagenic recombination could account for the pattern of neutral divergence across the genome, I developed a statistical method and applied it to human-chimp neutral divergence dataset. I showed that a model including both linked selection and mutagenic recombination resulted in the best fit to the empirical data. However, the signal of mutagenic recombination could be coming from biased gene conversion. Comparing genetic diversity between the X chromosome and the autosomes could provide insights into whether and how sex-biased processes have affected genetic variation between different genomic regions. For example, X/A diversity ratio greater than neutral expectation could be due to more X chromosomes than expected and could be a result of mating practices such as polygamy where there are more reproducing females than males. I next utilized whole-genome sequences from dogs and wolves and found that X/A diversity is lower than neutral expectation in both dogs and wolves in ancient time-scales, arguing for evolutionary processes resulting in more males reproducing compared to females. However, within breed dogs, patterns of population differentiation suggest that there have been more reproducing females, highlighting effects from breeding practices such as popular sire effect where one male can father many offspring with multiple females. In medical genetics, a complete understanding of the genetic architecture is essential to unravel the genetic basis of complex traits. While genome wide association studies (GWAS) have discovered thousands of trait-associated variants and thus have furthered our understanding of the genetic architecture, key parameters such as the number of causal variants and the mutational target size are still under-studied. Further, the role of natural selection in shaping the genetic architecture is still not entirely understood. In the last chapter, I developed a computational method called InGeAr to infer the mutational target size and explore the role of natural selection on affecting the variant's effect on the trait. I found that the mutational target size differs from trait to trait and can be large, up to tens of megabases. In addition, purifying selection is coupled with the variant's effect on the trait. I discussed how these results support the omnigenic model of complex traits. In summary, in this dissertation, I utilized different types of large genomic dataset, from genome-wide divergence data to whole genome sequence data to GWAS data to develop models and statistical methods to study how different evolutionary processes have shaped patterns of genetic variation across the genome.

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data

Methods and Models for the Analysis of Genetic Variation Across Species Using Large-scale Genomic Data PDF Author: Tanya Ngoc Phung
Publisher:
ISBN:
Category :
Languages : en
Pages : 213

Get Book Here

Book Description
Understanding how different evolutionary processes shape genetic variation within and between species is an important question in population genetics. The advent of next generation sequencing has allowed for many theories and hypotheses to be tested explicitly with data. However, questions such as what evolutionary processes affect neutral divergence (DNA differences between species) or genetic variation in different regions of the genome (such as on autosomes versus sex chromosomes) or how many genetic variants contribute to complex traits are still outstanding. In this dissertation, I utilized different large-scale genomic datasets and developed statistical methods to determine the role of natural selection on genetic variation between species, sex-biased evolutionary processes on shaping patterns of genetic variation on the X chromosome and autosomes, and how population history, mutation, and natural selection interact to control complex traits. First, I used genome-wide divergence data between multiple pairs of species ranging in divergence time to show that natural selection has reduced divergence at neutral sites that are linked to those under direct selection. To determine explicitly whether and to what extent linked selection and/or mutagenic recombination could account for the pattern of neutral divergence across the genome, I developed a statistical method and applied it to human-chimp neutral divergence dataset. I showed that a model including both linked selection and mutagenic recombination resulted in the best fit to the empirical data. However, the signal of mutagenic recombination could be coming from biased gene conversion. Comparing genetic diversity between the X chromosome and the autosomes could provide insights into whether and how sex-biased processes have affected genetic variation between different genomic regions. For example, X/A diversity ratio greater than neutral expectation could be due to more X chromosomes than expected and could be a result of mating practices such as polygamy where there are more reproducing females than males. I next utilized whole-genome sequences from dogs and wolves and found that X/A diversity is lower than neutral expectation in both dogs and wolves in ancient time-scales, arguing for evolutionary processes resulting in more males reproducing compared to females. However, within breed dogs, patterns of population differentiation suggest that there have been more reproducing females, highlighting effects from breeding practices such as popular sire effect where one male can father many offspring with multiple females. In medical genetics, a complete understanding of the genetic architecture is essential to unravel the genetic basis of complex traits. While genome wide association studies (GWAS) have discovered thousands of trait-associated variants and thus have furthered our understanding of the genetic architecture, key parameters such as the number of causal variants and the mutational target size are still under-studied. Further, the role of natural selection in shaping the genetic architecture is still not entirely understood. In the last chapter, I developed a computational method called InGeAr to infer the mutational target size and explore the role of natural selection on affecting the variant's effect on the trait. I found that the mutational target size differs from trait to trait and can be large, up to tens of megabases. In addition, purifying selection is coupled with the variant's effect on the trait. I discussed how these results support the omnigenic model of complex traits. In summary, in this dissertation, I utilized different types of large genomic dataset, from genome-wide divergence data to whole genome sequence data to GWAS data to develop models and statistical methods to study how different evolutionary processes have shaped patterns of genetic variation across the genome.

Population Genomics

Population Genomics PDF Author: Om P. Rajora
Publisher: Springer
ISBN: 3030045897
Category : Science
Languages : en
Pages : 822

Get Book Here

Book Description
Population genomics has revolutionized various disciplines of biology including population, evolutionary, ecological and conservation genetics, plant and animal breeding, human health, medicine and pharmacology by allowing to address novel and long-standing questions with unprecedented power and accuracy. It employs large-scale or genome-wide genetic information and bioinformatics to address various fundamental and applied aspects in biology and related disciplines, and provides a comprehensive genome-wide perspective and new insights that were not possible before. These advances have become possible due to the development of new and low-cost sequencing and genotyping technologies and novel statistical approaches and software, bioinformatics tools, and models. Population genomics is tremendously advancing our understanding the roles of evolutionary processes, such as mutation, genetic drift, gene flow, and natural selection, in shaping up genetic variation at individual loci and across the genome and populations; improving the assessment of population genetic parameters or processes such as adaptive evolution, effective population size, gene flow, admixture, inbreeding and outbreeding depression, demography, and biogeography; resolving evolutionary histories and phylogenetic relationships of extant, ancient and extinct species; understanding the genomic basis of fitness, adaptation, speciation, complex ecological and economically important traits, and disease and insect resistance; facilitating forensics, genetic medicine and pharmacology; delineating conservation genetic units; and understanding the genetic effects of resource management practices, and assisting conservation and sustainable management of genetic resources. This Population Genomics book discusses the concepts, approaches, applications and promises of population genomics in addressing most of the above fundamental and applied crucial aspects in a variety of organisms from microorganisms to humans. The book provides insights into a range of emerging population genomics topics including population epigenomics, landscape genomics, seascape genomics, paleogenomics, ecological and evolutionary genomics, biogeography, demography, speciation, admixture, colonization and invasion, genomic selection, and plant and animal domestication. This book fills a vacuum in the field and is expected to become a primary reference in Population Genomics world-wide.

Data Production and Analysis in Population Genomics

Data Production and Analysis in Population Genomics PDF Author: Francois Pompanon
Publisher: Humana Press
ISBN: 9781617798719
Category : Medical
Languages : en
Pages : 337

Get Book Here

Book Description
Population genomics is a recently emerged discipline, which aims at understanding how evolutionary processes influence genetic variation across genomes. Today, in the era of cheaper next-generation sequencing, it is no longer as daunting to obtain whole genome data for any species of interest and population genomics is now conceivable in a wide range of fields, from medicine and pharmacology to ecology and evolutionary biology. However, because of the lack of reference genome and of enough a priori data on the polymorphism, population genomics analyses of populations will still involve higher constraints for researchers working on non-model organisms, as regards the choice of the genotyping/sequencing technique or that of the analysis methods. Therefore, Data Production and Analysis in Population Genomics purposely puts emphasis on protocols and methods that are applicable to species where genomic resources are still scarce. It is divided into three convenient sections, each one tackling one of the main challenges facing scientists setting up a population genomics study. The first section helps devising a sampling and/or experimental design suitable to address the biological question of interest. The second section addresses how to implement the best genotyping or sequencing method to obtain the required data given the time and cost constraints as well as the other genetic resources already available, Finally, the last section is about making the most of the (generally huge) dataset produced by using appropriate analysis methods in order to reach a biologically relevant conclusion. Written in the successful Methods in Molecular BiologyTM series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible protocols, advice on methodology and implementation, and notes on troubleshooting and avoiding known pitfalls. Authoritative and easily accessible, Data Production and Analysis in Population Genomics serves a wide readership by providing guidelines to help choose and implement the best experimental or analytical strategy for a given purpose.

Methods and Analysis of Genome-scale Gene Family Evolution Across Multiple Species

Methods and Analysis of Genome-scale Gene Family Evolution Across Multiple Species PDF Author: Matthew David Rasmussen
Publisher:
ISBN:
Category :
Languages : en
Pages : 136

Get Book Here

Book Description
The fields of genomics and evolution have continually benefited from one another in their common goal of understanding the biological world. This partnership has been accelerated by ever increasing sequencing and high-throughput technologies. Although the future of genomic and evolutionary studies is bright, new models and methods will be needed to address the growing and changing challenges of large-scale datasets. In this work, I explore how evolution generates the diversity of life we see in modern species, specifically the evolution of new genes and functions. By reconstructing the history of the diverse sequences present in modern species, we can improve our understanding of their function and evolutionary importance. Performing such an analysis requires a principled and efficient means of computing the most probable evolutionary scenarios. To address these challenges, I introduce a new model of gene family evolution as well as a new method SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss rates, speciation times, and correlated substitution rate variation across both species and loci. I have implemented and applied this method on two clades of fully-sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies, and find dramatic improvements in reconstruction accuracy as compared to the most popular existing methods, including those that take the species tree into account. Lastly, I use the SPIMAP method to reconstruct the evolutionary history of all gene families in 16 fungal species including several relatives of the pathogenic species C. albicans. From these reconstructions, we identify several families enriched with duplications and positive selection in pathogenic lineages. Theses reconstructions shed light on the evolution of these species as well as a better understanding of the genes involved in pathogenicity.

Eucalypt Ecology

Eucalypt Ecology PDF Author: Jann Elizabeth Williams
Publisher: Cambridge University Press
ISBN: 9780521497404
Category : Gardening
Languages : en
Pages : 460

Get Book Here

Book Description
The dominant trees of Australia, eucalypts make up a remarkable genus. This authoritative volume provides current reviews by active researchers of many disciplines, including evolutionary history, genetics, distribution and modelling, the relationship of eucalypts to fire and nutrients, ecophysiology, pollination and reproductive ecology, interactions between eucalypts and other co-existing biota (including fungi, invertebrates and vertebrates), and conservation and management. Together these reviews shed light on the reasons for the great success of eucalypts in Australian environments, and provide a comprehensive summary for comparison with the ecology of major woody plant genera in other continents. This volume is of particular relevance to Australian ecologists, but also provides a stimulating perspective to students of vegetation ecology in all continents.

Next Steps for Functional Genomics

Next Steps for Functional Genomics PDF Author: National Academies of Sciences, Engineering, and Medicine
Publisher: National Academies Press
ISBN: 0309676738
Category : Science
Languages : en
Pages : 201

Get Book Here

Book Description
One of the holy grails in biology is the ability to predict functional characteristics from an organism's genetic sequence. Despite decades of research since the first sequencing of an organism in 1995, scientists still do not understand exactly how the information in genes is converted into an organism's phenotype, its physical characteristics. Functional genomics attempts to make use of the vast wealth of data from "-omics" screens and projects to describe gene and protein functions and interactions. A February 2020 workshop was held to determine research needs to advance the field of functional genomics over the next 10-20 years. Speakers and participants discussed goals, strategies, and technical needs to allow functional genomics to contribute to the advancement of basic knowledge and its applications that would benefit society. This publication summarizes the presentations and discussions from the workshop.

Genetic Dissection of Complex Traits

Genetic Dissection of Complex Traits PDF Author: D.C. Rao
Publisher: Academic Press
ISBN: 0080569110
Category : Medical
Languages : en
Pages : 788

Get Book Here

Book Description
The field of genetics is rapidly evolving and new medical breakthroughs are occuring as a result of advances in knowledge of genetics. This series continually publishes important reviews of the broadest interest to geneticists and their colleagues in affiliated disciplines. Five sections on the latest advances in complex traits Methods for testing with ethical, legal, and social implications Hot topics include discussions on systems biology approach to drug discovery; using comparative genomics for detecting human disease genes; computationally intensive challenges, and more

Models and Tools for Studying Genetic and Cultural Variation

Models and Tools for Studying Genetic and Cultural Variation PDF Author: Ethan Macneil Jewett
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
A fundamental goal of population genetics is to understand how historical events and processes, such as speciation, migration, natural selection, and drift, have contributed to genetic variation among modern individuals. In humans, factors that contribute to genetic variation also include cultural phenomena and practices such as marriage customs and membership in cultural or linguistic groups that act as either barriers to, or catalysts for, contact and mating. Mathematical models of genetic evolution can be applied to make inferences about factors that have influenced genetic variation among populations over time. Analyses of cultural data can enhance these analyses by identifying cultural phenomena that have contributed to contact and isolation among populations and by providing an additional source of information that can be used to infer demographic histories. In this thesis, I first describe work on mathematical modeling approaches that can be used to infer the historical relationships among populations, and to model the effects of these relationships on present-day genetic diversity. Next, I describe empirical analyses of cultural variation that shed light on recent cultural, geographic, and demographic factors that have influenced both cultural and genetic diffusion among populations. The first three chapters focus on mathematical models of genetic variation. In Chapter 1, I apply a coalescent model to reduce the expected error in an existing algorithm (the GLASS method) for inferring the historical relationships among populations or species. The new method I develop provides fast and accurate estimates of the topological and temporal relationships among a set of extant populations. These estimates can be used to obtain accurate null models for downstream analyses, such as comparative genetic studies to identify signals of adaptation. In Chapter 2, I extend the model of Chapter 1 to derive expressions for the theoretical accuracy of algorithms that perform genotype imputation, a key component of many genome-wide association studies of the genetic bases of phenotypic traits. The expressions I derive can be used to guide sampling designs for collecting panels of imputation reference haplotypes, thus improving the power of genome-wide association studies to detect the genetic variants that underlie phenotypic variation. Coalescent models like those presented in chapters 1 and 2 can be computationally difficult to implement on modern genomic data sets with many sampled individuals. The complexity of these computations can be reduced by making use of an approximation to the coalescent model in which the number of ancestral alleles in a population is assumed to change deterministically as time moves backwards. In Chapter 3, I describe general procedures for applying this deterministic approximation to obtain functionally simple and computationally fast approximations to coalescent formulas that are otherwise challenging to compute on data sets with many sampled individuals. In chapters 4 and 5, I present empirical analyses designed to identify cultural factors that can affect genetic variation, as well as factors that can affect both genetic and cultural differences among populations. In Chapter 4, I present an analysis of linguistic and cultural diversity in the United States, with the goal of understanding factors that have contributed to cultural isolation and diffusion among demographic groups over time. In Chapter 5, I present a joint analysis of genetic and linguistic data in Cape Verde, an archipelago near the coast of western Africa with a long history of genetic and linguistic admixture between European and African populations. This analysis sheds light on demographic and geographical factors that affect both genetic and linguistic variation, and on the degree to which linguistic inheritance parallels genetic inheritance. The modeling approaches, theory, and analyses presented in this thesis provide a set of tools that facilitate studies of the factors that affect genetic and cultural variation within and among populations.

Imaging Genetics

Imaging Genetics PDF Author: Adrian Dalca
Publisher: Academic Press
ISBN: 0128139692
Category : Technology & Engineering
Languages : en
Pages : 202

Get Book Here

Book Description
Imaging Genetics presents the latest research in imaging genetics methodology for discovering new associations between imaging and genetic variables, providing an overview of the state-of the-art in the field. Edited and written by leading researchers, this book is a beneficial reference for students and researchers, both new and experienced, in this growing area. The field of imaging genetics studies the relationships between DNA variation and measurements derived from anatomical or functional imaging data, often in the context of a disorder. While traditional genetic analyses rely on classical phenotypes like clinical symptoms, imaging genetics can offer richer insights into underlying, complex biological mechanisms. Contains an introduction describing how the field has evolved to the present, together with perspectives on its future direction and challenges Describes novel application domains and analytic methods that represent the state-of-the-art in the burgeoning field of imaging genetics Introduces a novel, large-scale analytic framework that involves multi-site, image-wide, genome-wide associations

Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-generation Sequencing Data

Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-generation Sequencing Data PDF Author: Xin Ma
Publisher:
ISBN:
Category :
Languages : en
Pages : 226

Get Book Here

Book Description
Next Generation Sequencing (NGS) technology has been widely adopted as a platform for DNA sequence variation detection and hence, accurate and rapid detection of genome variations using NGS data is critical for population genetics analyses. In my dissertation, I present three models that I developed to detect genome variation with high accuracy. In Chapter 2, I analyzed sequence data in orang-utan. The orang-utan species, Pongo pygmaeus (Bornean) and Pongo abelii (Sumatran), are great apes found on the islands of Borneo and Sumatran. Populations on both islands are from the same ancestry but were subsequently isolated after the split. Due to recent deforestation to both islands, these species are critically endangered. Knowing their demographical history will not only help us better protect them, but it will provide us with a higher resolution evolutionary map for primates. It will also give us a powerful perspective on hominid biology because orangutans are the most phytogenetically distant great apes from humans. In this study, we have sampled five wild-caught orang-utans from each of the two populations. One individual was sequenced to 20X coverage; the rest have median coverages between 6-8X. I developed a Bayesian population genomic variation detection tool which not only captures the population structure between these two populations but also pools all the allele frequency information among all in- dividuals within the same population to boost the power of the variation detection in low coverage individuals. Our analysis revealed that, compared to other primates, the orang-utan genome has many unique features. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period with more deleterious mutation accumulation. Despite some evidence for stronger negative selection in Sumatran orang-utans, detecting patterns of selection by fitting different selection models upon the baseline demographical model with nonsynonmous SNPs using ðaði showed that the distribution of selection forces is actually similar to that in human with roughly 80% of mutations having a selection coefficient more negative than s [ALMOST EQUAL TO] 3 x 10[-]5 . In Chapter 3, I undertook a second project aimed at understanding the molecular mechanisms that lead to mutation variation in yeast. This work is likely to provide insights not only in molecular evolution but also in understanding human disease progression. To analyze with limited bias genomic features associated with DNA polymerase errors, we performed a genomewide analysis of mutations that accumulate in mismatch repair (MMR) deficient diploid lines of Saccharomyces cerevisiae. These lines were derived from a common ancestor and were grown for 160 generations, with bottlenecks reducing the population to one cell every twenty generations. We sequenced one wild- type and three mutator lines at coverages from eight and twenty-fold using Illumina Solexa 36-bp single reads. Using an experimentally aware Bayesian genotype caller developed to pool experimental data across sequencing runs for all strains, we detected 28 heterozygous single-nucleotide polymorphisms (SNPs) and 48 single nucleotide (nt) insertion/deletions (indels) from the data set. This method was evaluated on simulated data sets and found to have a very low false positive rate (~6 x 10[-]5) and a false negative rate of 0.08 within the unique (i.e., non-repetitive) mapping regions of the genome that contained at least sevenfold coverage. The heterozygous mutations identified by the Bayesian genotype caller were confirmed by Sanger sequencing. Our findings is interesting because frameshift mutations in homopolymer (HP) tracts, which are present at high levels in the yeast genome (> 77,400 for five to twenty nt HP tracts), are likely to disrupt gene function and further demonstrate that the mutation pattern seen previously in mismatch repair defective strains using a limited number of reporters holds true for the entire genome. In Chapter 4, I presented an analysis of mutation hotspots in yeast deficient in DNA mismatch repair (MMR). Classical evolutionary theory assumes that mutations occur randomly in the genome; however studies performed in a variety of organisms indicate existence of context-dependent mutational biases. All of these biases involve local sequence context (e.g., increased rate of cytosine deamination at methylated CpG's in mammals), but the source of mutagenesis variation across larger genomic contexts (e.g., tens or hundreds of bases) have not been identified. Therefore, we use high-coverage whole genome sequencing (>200X coverage) of progenitor and derived conditional MMR mutant line of diploid yeast to confidently identify 92 mutations that accumulated after 160 generations of vegetative growth by using log-likelihood ratio test. We found that the 73 single and double bp insert/deletion mutations accumulate much more frequently in homopolymeric poly-A and poly-T tracts with all mutations occurring at sites with at least 5 hp runs. Surprisingly, we demonstrated that the the likelihood of an indel mutation in a given poly (dA:dT) homopolymeric tract is increased by the presence of nearby poly (dA:dT) tracts in up to a 1000 bp region centered on the given tract. Furthermore, we identified nine positions that were mutated independently in at least two replicate lines and these all occurred at sites with at least 8 homopolymeric runs, suggesting greater instability for higher poly An or poly T n sites. Our work suggests that specific mutation hotspots can contribute disproportionately to the genetic variation that is introduced into populations, and provides the first long-range genomic sequence context that contributes to mutagenesis.