Computational Molecular Evolution

Computational Molecular Evolution PDF Author: Ziheng Yang
Publisher: Oxford University Press, USA
ISBN: 0198566999
Category : Medical
Languages : en
Pages : 374

Get Book Here

Book Description
This book describes the models, methods and algorithms that are most useful for analysing the ever-increasing supply of molecular sequence data, with a view to furthering our understanding of the evolution of genes and genomes.

Computational Molecular Evolution

Computational Molecular Evolution PDF Author: Ziheng Yang
Publisher: Oxford University Press, USA
ISBN: 0198566999
Category : Medical
Languages : en
Pages : 374

Get Book Here

Book Description
This book describes the models, methods and algorithms that are most useful for analysing the ever-increasing supply of molecular sequence data, with a view to furthering our understanding of the evolution of genes and genomes.

Dynamic Homology and Phylogenetic Systematics

Dynamic Homology and Phylogenetic Systematics PDF Author: Ward Wheeler
Publisher:
ISBN:
Category : Nature
Languages : en
Pages : 380

Get Book Here

Book Description


Accelerating Monte Carlo methods for Bayesian inference in dynamical models

Accelerating Monte Carlo methods for Bayesian inference in dynamical models PDF Author: Johan Dahlin
Publisher: Linköping University Electronic Press
ISBN: 9176857972
Category :
Languages : sv
Pages : 139

Get Book Here

Book Description
Making decisions and predictions from noisy observations are two important and challenging problems in many areas of society. Some examples of applications are recommendation systems for online shopping and streaming services, connecting genes with certain diseases and modelling climate change. In this thesis, we make use of Bayesian statistics to construct probabilistic models given prior information and historical data, which can be used for decision support and predictions. The main obstacle with this approach is that it often results in mathematical problems lacking analytical solutions. To cope with this, we make use of statistical simulation algorithms known as Monte Carlo methods to approximate the intractable solution. These methods enjoy well-understood statistical properties but are often computational prohibitive to employ. The main contribution of this thesis is the exploration of different strategies for accelerating inference methods based on sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). That is, strategies for reducing the computational effort while keeping or improving the accuracy. A major part of the thesis is devoted to proposing such strategies for the MCMC method known as the particle Metropolis-Hastings (PMH) algorithm. We investigate two strategies: (i) introducing estimates of the gradient and Hessian of the target to better tailor the algorithm to the problem and (ii) introducing a positive correlation between the point-wise estimates of the target. Furthermore, we propose an algorithm based on the combination of SMC and Gaussian process optimisation, which can provide reasonable estimates of the posterior but with a significant decrease in computational effort compared with PMH. Moreover, we explore the use of sparseness priors for approximate inference in over-parametrised mixed effects models and autoregressive processes. This can potentially be a practical strategy for inference in the big data era. Finally, we propose a general method for increasing the accuracy of the parameter estimates in non-linear state space models by applying a designed input signal. Borde Riksbanken höja eller sänka reporäntan vid sitt nästa möte för att nå inflationsmålet? Vilka gener är förknippade med en viss sjukdom? Hur kan Netflix och Spotify veta vilka filmer och vilken musik som jag vill lyssna på härnäst? Dessa tre problem är exempel på frågor där statistiska modeller kan vara användbara för att ge hjälp och underlag för beslut. Statistiska modeller kombinerar teoretisk kunskap om exempelvis det svenska ekonomiska systemet med historisk data för att ge prognoser av framtida skeenden. Dessa prognoser kan sedan användas för att utvärdera exempelvis vad som skulle hända med inflationen i Sverige om arbetslösheten sjunker eller hur värdet på mitt pensionssparande förändras när Stockholmsbörsen rasar. Tillämpningar som dessa och många andra gör statistiska modeller viktiga för många delar av samhället. Ett sätt att ta fram statistiska modeller bygger på att kontinuerligt uppdatera en modell allteftersom mer information samlas in. Detta angreppssätt kallas för Bayesiansk statistik och är särskilt användbart när man sedan tidigare har bra insikter i modellen eller tillgång till endast lite historisk data för att bygga modellen. En nackdel med Bayesiansk statistik är att de beräkningar som krävs för att uppdatera modellen med den nya informationen ofta är mycket komplicerade. I sådana situationer kan man istället simulera utfallet från miljontals varianter av modellen och sedan jämföra dessa mot de historiska observationerna som finns till hands. Man kan sedan medelvärdesbilda över de varianter som gav bäst resultat för att på så sätt ta fram en slutlig modell. Det kan därför ibland ta dagar eller veckor för att ta fram en modell. Problemet blir särskilt stort när man använder mer avancerade modeller som skulle kunna ge bättre prognoser men som tar för lång tid för att bygga. I denna avhandling använder vi ett antal olika strategier för att underlätta eller förbättra dessa simuleringar. Vi föreslår exempelvis att ta hänsyn till fler insikter om systemet och därmed minska antalet varianter av modellen som behöver undersökas. Vi kan således redan utesluta vissa modeller eftersom vi har en bra uppfattning om ungefär hur en bra modell ska se ut. Vi kan också förändra simuleringen så att den enklare rör sig mellan olika typer av modeller. På detta sätt utforskas rymden av alla möjliga modeller på ett mer effektivt sätt. Vi föreslår ett antal olika kombinationer och förändringar av befintliga metoder för att snabba upp anpassningen av modellen till observationerna. Vi visar att beräkningstiden i vissa fall kan minska ifrån några dagar till någon timme. Förhoppningsvis kommer detta i framtiden leda till att man i praktiken kan använda mer avancerade modeller som i sin tur resulterar i bättre prognoser och beslut.

Seamless R and C++ Integration with Rcpp

Seamless R and C++ Integration with Rcpp PDF Author: Dirk Eddelbuettel
Publisher: Springer Science & Business Media
ISBN: 146146868X
Category : Computers
Languages : en
Pages : 236

Get Book Here

Book Description
Rcpp is the glue that binds the power and versatility of R with the speed and efficiency of C++. With Rcpp, the transfer of data between R and C++ is nearly seamless, and high-performance statistical computing is finally accessible to most R users. Rcpp should be part of every statistician's toolbox. -- Michael Braun, MIT Sloan School of Management "Seamless R and C++ integration with Rcpp" is simply a wonderful book. For anyone who uses C/C++ and R, it is an indispensable resource. The writing is outstanding. A huge bonus is the section on applications. This section covers the matrix packages Armadillo and Eigen and the GNU Scientific Library as well as RInside which enables you to use R inside C++. These applications are what most of us need to know to really do scientific programming with R and C++. I love this book. -- Robert McCulloch, University of Chicago Booth School of Business Rcpp is now considered an essential package for anybody doing serious computational research using R. Dirk's book is an excellent companion and takes the reader from a gentle introduction to more advanced applications via numerous examples and efficiency enhancing gems. The book is packed with all you might have ever wanted to know about Rcpp, its cousins (RcppArmadillo, RcppEigen .etc.), modules, package development and sugar. Overall, this book is a must-have on your shelf. -- Sanjog Misra, UCLA Anderson School of Management The Rcpp package represents a major leap forward for scientific computations with R. With very few lines of C++ code, one has R's data structures readily at hand for further computations in C++. Hence, high-level numerical programming can be made in C++ almost as easily as in R, but often with a substantial speed gain. Dirk is a crucial person in these developments, and his book takes the reader from the first fragile steps on to using the full Rcpp machinery. A very recommended book! -- Søren Højsgaard, Department of Mathematical Sciences, Aalborg University, Denmark "Seamless R and C ++ Integration with Rcpp" provides the first comprehensive introduction to Rcpp. Rcpp has become the most widely-used language extension for R, and is deployed by over one-hundred different CRAN and BioConductor packages. Rcpp permits users to pass scalars, vectors, matrices, list or entire R objects back and forth between R and C++ with ease. This brings the depth of the R analysis framework together with the power, speed, and efficiency of C++. Dirk Eddelbuettel has been a contributor to CRAN for over a decade and maintains around twenty packages. He is the Debian/Ubuntu maintainer for R and other quantitative software, edits the CRAN Task Views for Finance and High-Performance Computing, is a co-founder of the annual R/Finance conference, and an editor of the Journal of Statistical Software. He holds a Ph.D. in Mathematical Economics from EHESS (Paris), and works in Chicago as a Senior Quantitative Analyst.

Numerical Computations with GPUs

Numerical Computations with GPUs PDF Author: Volodymyr Kindratenko
Publisher: Springer
ISBN: 3319065483
Category : Computers
Languages : en
Pages : 404

Get Book Here

Book Description
This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and engineering computations. Each chapter is written by authors working on a specific group of methods; these leading experts provide mathematical background, parallel algorithms and implementation details leading to reusable, adaptable and scalable code fragments. This book also serves as a GPU implementation manual for many numerical algorithms, sharing tips on GPUs that can increase application efficiency. The valuable insights into parallelization strategies for GPUs are supplemented by ready-to-use code fragments. Numerical Computations with GPUs targets professionals and researchers working in high performance computing and GPU programming. Advanced-level students focused on computer science and mathematics will also find this book useful as secondary text book or reference.

Algebraic Statistics for Computational Biology

Algebraic Statistics for Computational Biology PDF Author: L. Pachter
Publisher: Cambridge University Press
ISBN: 9780521857000
Category : Mathematics
Languages : en
Pages : 440

Get Book Here

Book Description
This book, first published in 2005, offers an introduction to the application of algebraic statistics to computational biology.

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning PDF Author: Carl Edward Rasmussen
Publisher: MIT Press
ISBN: 026218253X
Category : Computers
Languages : en
Pages : 266

Get Book Here

Book Description
A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Machine Learning for Ecology and Sustainable Natural Resource Management

Machine Learning for Ecology and Sustainable Natural Resource Management PDF Author: Grant Humphries
Publisher: Springer
ISBN: 3319969781
Category : Science
Languages : en
Pages : 442

Get Book Here

Book Description
Ecologists and natural resource managers are charged with making complex management decisions in the face of a rapidly changing environment resulting from climate change, energy development, urban sprawl, invasive species and globalization. Advances in Geographic Information System (GIS) technology, digitization, online data availability, historic legacy datasets, remote sensors and the ability to collect data on animal movements via satellite and GPS have given rise to large, highly complex datasets. These datasets could be utilized for making critical management decisions, but are often “messy” and difficult to interpret. Basic artificial intelligence algorithms (i.e., machine learning) are powerful tools that are shaping the world and must be taken advantage of in the life sciences. In ecology, machine learning algorithms are critical to helping resource managers synthesize information to better understand complex ecological systems. Machine Learning has a wide variety of powerful applications, with three general uses that are of particular interest to ecologists: (1) data exploration to gain system knowledge and generate new hypotheses, (2) predicting ecological patterns in space and time, and (3) pattern recognition for ecological sampling. Machine learning can be used to make predictive assessments even when relationships between variables are poorly understood. When traditional techniques fail to capture the relationship between variables, effective use of machine learning can unearth and capture previously unattainable insights into an ecosystem's complexity. Currently, many ecologists do not utilize machine learning as a part of the scientific process. This volume highlights how machine learning techniques can complement the traditional methodologies currently applied in this field.

Compositional Data Analysis

Compositional Data Analysis PDF Author: Vera Pawlowsky-Glahn
Publisher: John Wiley & Sons
ISBN: 0470711353
Category : Mathematics
Languages : en
Pages : 405

Get Book Here

Book Description
It is difficult to imagine that the statistical analysis of compositional data has been a major issue of concern for more than 100 years. It is even more difficult to realize that so many statisticians and users of statistics are unaware of the particular problems affecting compositional data, as well as their solutions. The issue of ``spurious correlation'', as the situation was phrased by Karl Pearson back in 1897, affects all data that measures parts of some whole, such as percentages, proportions, ppm and ppb. Such measurements are present in all fields of science, ranging from geology, biology, environmental sciences, forensic sciences, medicine and hydrology. This book presents the history and development of compositional data analysis along with Aitchison's log-ratio approach. Compositional Data Analysis describes the state of the art both in theoretical fields as well as applications in the different fields of science. Key Features: Reflects the state-of-the-art in compositional data analysis. Gives an overview of the historical development of compositional data analysis, as well as basic concepts and procedures. Looks at advances in algebra and calculus on the simplex. Presents applications in different fields of science, including, genomics, ecology, biology, geochemistry, planetology, chemistry and economics. Explores connections to correspondence analysis and the Dirichlet distribution. Presents a summary of three available software packages for compositional data analysis. Supported by an accompanying website featuring R code. Applied scientists working on compositional data analysis in any field of science, both in academia and professionals will benefit from this book, along with graduate students in any field of science working with compositional data.

Data Mining in Bioinformatics

Data Mining in Bioinformatics PDF Author: Jason T. L. Wang
Publisher: Springer Science & Business Media
ISBN: 9781852336714
Category : Computers
Languages : en
Pages : 356

Get Book Here

Book Description
Written especially for computer scientists, all necessary biology is explained. Presents new techniques on gene expression data mining, gene mapping for disease detection, and phylogenetic knowledge discovery.