Fast Algorithms for Large-scale Phylogenetic Reconstruction

Fast Algorithms for Large-scale Phylogenetic Reconstruction PDF Author: Jakub Truszkowski
Publisher:
ISBN:
Category :
Languages : en
Pages : 135

Get Book Here

Book Description
One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+[gamma](g)} log^2 n) time, where [gamma] is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and [gamma](g)

Fast Algorithms for Large-scale Phylogenetic Reconstruction

Fast Algorithms for Large-scale Phylogenetic Reconstruction PDF Author: Jakub Truszkowski
Publisher:
ISBN:
Category :
Languages : en
Pages : 135

Get Book Here

Book Description
One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+[gamma](g)} log^2 n) time, where [gamma] is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and [gamma](g)

Fast and Accurate Supertrees

Fast and Accurate Supertrees PDF Author: Markus Fleischauer
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Phylogenetics is the study of evolutionary relationships between biological entities; phylogenetic trees (phylogenies) are a visualization of these evolutionary relationships. Accurate approaches to reconstruct hylogenies from sequence data usually result in NPhard optimization problems, hence local search heuristics have to be applied in practice. These methods are highly accurate and fast enough as long as the input data is not too large. Divide-and-conquer techniques are a promising approach to boost scalability and accuracy of those local search heuristics on very large datasets. A divide-and-conquer method breaks down a large phylogenetic problem into smaller sub-problems that are computationally easier to solve. The sub-problems (overlapping trees) are then combined using a supertree method. Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there are methods that encode the source trees in a matrix and the supertree is constructed applying a local search heuristic to optimize the respective objective function. The most widely used supertree methods use such local search heuristics. However, to really improve the scalability of accurate tree reconstruction by divide-and-conquer approaches, accurate polynomial time methods are needed for the supertree reconstruction step. In this work, we present approaches for accurate polynomial time supertree reconstruction in particular Bad Clade Deletion (BCD), a novel heuristic supertree algorithm with polynomial running time. BCD uses minimum cuts to greedily delete a locally minimal number of columns from a matrix representation to make it compatible. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees if one exists. BCD can take support values of the source trees into account without an increase in complexity. We show how reliable clades can be used to restrict the search space for BCD and how those clades can be collected from the input data using the Greedy Strict Consensus Merger. Finally, we introduce a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of BCD with beam search extension is still polynomial. We present an exact and a randomized subroutine to generate suboptimal partial solutions. In our thorough evaluation on several simulated and biological datasets against a representative set of supertree methods we found that BCD is more accurate than the most accurate supertree methods when using support values and search space restriction on simulated data. Simultaneously BCD is faster than any other evaluated method. The beam search approach improved the accuracy of BCD on all evaluated datasets at the cost of speed. We found that BCD supertrees can boost maximum likelihood tree reconstruction when used as starting tree. Further, BCD could handle large scale datasets where local search heuristics did not converge in reasonable time. Due to its combination of speed, accuracy, and the ability to reconstruct the parent tree if one exists, BCD is a promising approach to enable outstanding scalability of divide-and-conquer approaches.

Reconstruction of Large-scale Phylogenies

Reconstruction of Large-scale Phylogenies PDF Author: Vaibhav Rajan
Publisher:
ISBN:
Category :
Languages : en
Pages : 99

Get Book Here

Book Description


Inferring Phylogenies

Inferring Phylogenies PDF Author: Joseph Felsenstein
Publisher: Sinauer Associates Incorporated
ISBN: 9780878931774
Category : Science
Languages : en
Pages : 664

Get Book Here

Book Description
Phylogenies, or evolutionary trees, are the basic structures necessary to think about and analyze differences between species. Statistical, computational, and algorithmic work in this field has been ongoing for four decades now, and there have been great advances in understanding. Yet no book has summarized this work. Inferring Phylogenies does just that in a single, compact volume. Phylogenies are inferred with various kinds of data. This book concentrates on some of the central ones: discretely coded characters, molecular sequences, gene frequencies, and quantitative traits. Also covered are restriction sites, RAPDs, and microsatellites.

Fast and Accurate Estimation of Large-scale Phylogenetic Alignments and Trees

Fast and Accurate Estimation of Large-scale Phylogenetic Alignments and Trees PDF Author: Kevin Jensen Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 458

Get Book Here

Book Description
Phylogenetics is the study of evolutionary relationships. Phylogenetic trees and alignments play important roles in a wide range of biological research, including reconstruction of the Tree of Life - the evolutionary history of all organisms on Earth - and the development of vaccines and antibiotics. Today's phylogenetic studies seek to reconstruct trees and alignments on a greater number and variety of organisms than ever before, primarily due to exponential growth in affordable sequencing and computing power. The importance of phylogenetic trees and alignments motivates the need for methods to reconstruct them accurately and efficiently on large-scale datasets. Traditionally, phylogenetic studies proceed in two phases: first, an alignment is produced from biomolecular sequences with differing lengths, and, second, a tree is produced using the alignment. My dissertation presents the first empirical performance study of leading two-phase methods on datasets with up to hundreds of thousands of sequences. Relatively accurate alignments and trees were obtained using methods with high computational requirements on datasets with a few hundred sequences, but as datasets grew past 1000 sequences and up to tens of thousands of sequences, the set of methods capable of analyzing a dataset diminished and only the methods with the lowest computational requirements and lowest accuracy remained. Alternatively, methods have been developed to simultaneously estimate phylogenetic alignments and trees. Methods optimizing the treelength optimization problem - the most widely-used approach for simultaneous estimation - have not been shown to return more accurate trees and alignments than two-phase approaches. I demonstrate that treelength optimization under a particular class of optimization criteria represents a promising means for inferring accurate trees and alignments. The other methods for simultaneous estimation are not known to support analyses of datasets with a few hundred sequences due to their high computational requirements. The main contribution of my dissertation is SATe, the first fast and accurate method for simultaneous estimation of alignments and trees on datasets with up to several thousand nucleotide sequences. SATe improves upon the alignment and topological accuracy of all existing methods, especially on the most difficult-to-align datasets, while retaining reasonable computational requirements.

Algorithms in Bioinformatics

Algorithms in Bioinformatics PDF Author: Steven L. Salzberg
Publisher: Springer Science & Business Media
ISBN: 3642042406
Category : Science
Languages : en
Pages : 440

Get Book Here

Book Description
These proceedings contain papers from the 2009 Workshop on Algorithms in Bioinformatics (WABI), held at the University of Pennsylvania in Philadelphia, Pennsylvania during September 12–13, 2009. WABI 2009 was the ninth annual conference in this series, which focuses on novel algorithms that address imp- tantproblemsingenomics,molecularbiology,andevolution.Theconference- phasizes research that describes computationally e?cient algorithms and data structures that have been implemented and tested in simulations and on real data. WABI is sponsored by the European Association for Theoretical C- puter Science (EATCS) and the International Society for Computational Bi- ogy (ISCB). WABI 2009 was supported by the Penn Genome Frontiers Institute and the Penn Center for Bioinformatics at the University of Pennsylvania. For the 2009 conference, 90 full papers were submitted for review by the Program Committee, and from this strong ?eld of submissions, 34 papers were chosen for presentation at the conference and publication in the proceedings. The ?nal programcovered a wide range of topics including gene interaction n- works, molecular phylogeny, RNA and protein structure, and genome evolution.

Models and Algorithms for Genome Evolution

Models and Algorithms for Genome Evolution PDF Author: Cedric Chauve
Publisher: Springer Science & Business Media
ISBN: 1447152980
Category : Computers
Languages : en
Pages : 329

Get Book Here

Book Description
This authoritative text/reference presents a review of the history, current status, and potential future directions of computational biology in molecular evolution. Gathering together the unique insights of an international selection of prestigious researchers, this must-read volume examines the latest developments in the field, the challenges that remain, and the new avenues emerging from the growing influx of sequence data. These viewpoints build upon the pioneering work of David Sankoff, one of the founding fathers of computational biology, and mark the 50th anniversary of his first scientific article. The broad spectrum of rich contributions in this essential collection will appeal to all computer scientists, mathematicians and biologists involved in comparative genomics, phylogenetics and related areas.

Algorithms, Load Balancing Strategies, and Dynamic Kernels for Large-scale Phylogenetic Tree Inference Under Maximum Likelihood

Algorithms, Load Balancing Strategies, and Dynamic Kernels for Large-scale Phylogenetic Tree Inference Under Maximum Likelihood PDF Author: Benoit Morel
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Algorithms in Bioinformatics

Algorithms in Bioinformatics PDF Author: Ben Raphael
Publisher: Springer
ISBN: 364233122X
Category : Computers
Languages : en
Pages : 465

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 12th International Workshop on Algorithms in Bioinformatics, WABI 2012, held in Ljubljana, Slovenia, in September 2012. WABI 2012 is one of six workshops which, along with the European Symposium on Algorithms (ESA), constitute the ALGO annual meeting and focuses on algorithmic advances in bioinformatics, computational biology, and systems biology with a particular emphasis on discrete algorithms and machine-learning methods that address important problems in molecular biology. The 35 full papers presented were carefully reviewed and selected from 92 submissions. The papers include algorithms for a variety of biological problems including phylogeny, DNA and RNA sequencing and analysis, protein structure, and others.

Large-scale Phylogenetic Reconstruction from Arbitrary Gene-order Data

Large-scale Phylogenetic Reconstruction from Arbitrary Gene-order Data PDF Author: Jijun Tang
Publisher:
ISBN:
Category : Gene mapping
Languages : en
Pages : 230

Get Book Here

Book Description