Algorithms for Phylogenetic Tree Reconstruction Based on Genome Rearrangements [microform]

Algorithms for Phylogenetic Tree Reconstruction Based on Genome Rearrangements [microform] PDF Author: Bourque, Guillaume
Publisher: Ann Arbor, Mich. : University Microfilms International
ISBN:
Category :
Languages : en
Pages : 190

Get Book Here

Book Description

Algorithms for Phylogenetic Tree Reconstruction Based on Genome Rearrangements [microform]

Algorithms for Phylogenetic Tree Reconstruction Based on Genome Rearrangements [microform] PDF Author: Bourque, Guillaume
Publisher: Ann Arbor, Mich. : University Microfilms International
ISBN:
Category :
Languages : en
Pages : 190

Get Book Here

Book Description


Models and Algorithms for Genome Evolution

Models and Algorithms for Genome Evolution PDF Author: Cedric Chauve
Publisher: Springer Science & Business Media
ISBN: 1447152980
Category : Computers
Languages : en
Pages : 329

Get Book Here

Book Description
This authoritative text/reference presents a review of the history, current status, and potential future directions of computational biology in molecular evolution. Gathering together the unique insights of an international selection of prestigious researchers, this must-read volume examines the latest developments in the field, the challenges that remain, and the new avenues emerging from the growing influx of sequence data. These viewpoints build upon the pioneering work of David Sankoff, one of the founding fathers of computational biology, and mark the 50th anniversary of his first scientific article. The broad spectrum of rich contributions in this essential collection will appeal to all computer scientists, mathematicians and biologists involved in comparative genomics, phylogenetics and related areas.

Enhance the Understanding of Whole-genome Evolution by Designing, Accelerating and Parallelizing Phylogenetic Algorithms

Enhance the Understanding of Whole-genome Evolution by Designing, Accelerating and Parallelizing Phylogenetic Algorithms PDF Author: Zhaoming Yin
Publisher:
ISBN:
Category : Algorithms
Languages : en
Pages :

Get Book Here

Book Description
The advent of new technology enhance the speed and reduce the cost for sequencing biological data. Making biological sense of this genomic data is a big challenge to the algorithm design as well as the high performance computing society. There are many problems in Bioinformatics, such as how new functional genes arise, why genes are organized into chromosomes, how species are connected through the evolutionary tree of life, or why arrangements are subject to change. Phylogenetic analyses have become essential to research on the evolutionary tree of life. It can help us to track the history of species and the relationship between different genes or genomes through millions of years. One of the fundamentals for phylogenetic construction is the computation of distances between genomes. Since there are much more complicated combinatoric patterns in rearrangement events, the distance computation is still a hot topic as much belongs to mathematics as to biology. For the distance computation with input of two genomes containing unequal gene contents (with insertions/deletions and duplications) the problem is especially hard. In this thesis, we will discuss about our contributions to the distance estimation for unequal gene order data. The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. For genomes with unequal contents, to the best of our knowledge, there is no algorithm that can help to find the median. In this thesis, we make our contributions to the median computation in two aspects. 1) Algorithm engineering aspect, we harness the power of streaming graph analytics methods to implement an exact DCJ median algorithm which run as fast as the heuristic algorithm and can help construct a better phylogenetic tree. 2) Algorithmic aspect, we theoretically formulate the problem of finding median with input of genomes having unequal gene content, which leads to the design and implementation of an efficient Lin-Kernighan heuristic based median algorithm. Inferring phylogenies (evolutionary history) of a set of given species is the ultimate goal when the distance and median model are chosen. For more than a decade, biologists and computer scientists have studied how to infer phylogenies by the measurement of genome rearrangement events using gene order data. While evolution is not an inherently parsimonious process, maximum parsimony (MP) phylogenetic analysis has been supported by widely applied to the phylogeny inference to study the evolutionary patterns of genome rearrangements. There are generally two problems with the MP phylogenetic arose by genome rearrangement: One is, given a set of modern genomes, how to compute the topologies of the according phylogenetic tree; Another is, given the topology of a model tree, how to infer the gene orders of the ancestor species. To assemble a MP phylogenetic tree constructor, there are multiple NP hard problems involved, unfortunately, they organized as one problem on top of other problems. Which means, to solve a NP hard problem, we need to solve multiple NP hard sub-problems. For phylogenetic tree construction with the input of unequal content genomes, there are three layers of NP hard problems. In this thesis, we will mainly discuss about our contributions to the design and implementation of the software package DCJUC (Phylogeny Inference using DCJ model to cope with Unequal Content Genomes), that can help to achieve both of these two goals. Aside from the biological problems, another issue we need to concern is about the use of the power of parallel computing to assist accelerating algorithms to handle huge data sets, such as the high resolution gene order data. For one thing, all of the method to tackle with phylogenetic problems are based on branch and bound algorithms, which are quite irregular and unfriendly to parallel computing. To parallelize these algorithms, we need to properly enhance the efficiency for localized memory access and load balance methods to make sure that each thread can put their potentials into full play. For the other, there is a revolution taking place in computing with the availability of commodity graphical processors such as Nvidia GPU and with many-core CPUs such as Cray-XMT, or Intel Xeon Phi Coprocessor with 60 cores. These architectures provide a new way for us to achieve high performance at much lower cost. However, code running on these machines are not so easily programmed, and scientific computing is hard to tune well on them. We try to explore the potentials of these architectures to help us accelerate branch and bound based phylogenetic algorithms.

Evolutionary Ancestor Inference Via Genome Rearrangement

Evolutionary Ancestor Inference Via Genome Rearrangement PDF Author: Zaky Adam
Publisher:
ISBN:
Category : University of Ottawa theses
Languages : en
Pages : 198

Get Book Here

Book Description


Evaluation of Phylogeny Reconstruction Algorithms

Evaluation of Phylogeny Reconstruction Algorithms PDF Author: Dehua Hang
Publisher:
ISBN:
Category : Branch and bound algorithms
Languages : en
Pages : 294

Get Book Here

Book Description


Inference of Insertion and Deletion Scenarios for Ancestral Genome Reconstruction and Phylogenetic Analyses

Inference of Insertion and Deletion Scenarios for Ancestral Genome Reconstruction and Phylogenetic Analyses PDF Author: Abdoulaye Diallo
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Phylogenetic Tree Reconstruction with Protein Linkage

Phylogenetic Tree Reconstruction with Protein Linkage PDF Author: Junjie Yu
Publisher: Open Dissertation Press
ISBN: 9781361307441
Category :
Languages : en
Pages :

Get Book Here

Book Description
This dissertation, "Phylogenetic Tree Reconstruction With Protein Linkage" by Junjie, Yu, 于俊杰, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent. Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard. In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4 DEGREESm-n) time for whole/partial block linkage distance and O(4 DEGREESm-- (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4 DEGREESm- m- n) or O(4 DEGREESm- m DEGREES2-- n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance. DOI: 10.5353/th_b4961816 Subjects: Phylogeny Combinatorial analysis

Ancestral Reconstruction and Investigations of Genomics Recombination on Chloroplasts Genomes

Ancestral Reconstruction and Investigations of Genomics Recombination on Chloroplasts Genomes PDF Author: Bashar Al-Nuaimi
Publisher:
ISBN:
Category :
Languages : en
Pages : 144

Get Book Here

Book Description
The theory of evolution is based on modern biology. All new species emerge of an existing species. As a result, different species share common ancestry,as represented in the phylogenetic classification. Common ancestry may explainthe similarities between all living organisms, such as general chemistry, cell structure,DNA as genetic material and genetic code. Individuals of one species share the same genes but (usually) different allele sequences of these genes. An individual inheritsalleles of their ancestry or their parents. The goal of phylogenetic studies is to analyzethe changes that occur in different organisms during evolution by identifying therelationships between genomic sequences and determining the ancestral sequences and theirdescendants. A phylogeny study can also estimate the time of divergence betweengroups of organisms that share a common ancestor. Phylogenetic trees are usefulin the fields of biology, such as bioinformatics, for systematic phylogeneticsand comparative. The evolutionary tree or the phylogenetic tree is a branched exposure the relationsevolutionary between various biological organisms or other existence depending on the differences andsimilarities in their genetic characteristics. Phylogenetic trees are built infrom molecular data such as DNA sequences and protein sequences. Ina phylogenetic tree, the nodes represent genomic sequences and are calledtaxonomic units. Each branch connects two adjacent nodes. Each similar sequencewill be a neighbor on the outer branches, and a common internal branch will link them to acommon ancestor. Internal branches are called hypothetical taxonomic units. Thus,Taxonomic units gathered in the tree involve being descended from a common ancestor. Ourresearch conducted in this dissertation focuses on improving evolutionary prototypesappropriate and robust algorithms to solve phylogenetic inference problems andancestral information about the order of genes and DNA data in the evolution of the complete genome, as well astheir applications.

A New Algorithm for the Reconstruction of Near-perfect Binary Phylogenetic Trees

A New Algorithm for the Reconstruction of Near-perfect Binary Phylogenetic Trees PDF Author: Kedar Dhamdhere
Publisher:
ISBN:
Category : Computational biology
Languages : en
Pages : 18

Get Book Here

Book Description
Abstract: "In this paper, we consider the problem of reconstructing near-perfect phylogenetic trees using binary characters. A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree. The algorithm for reconstructing a perfect phylogeny for binary characters is computationally efficient but impractical in most real settings. A near-perfect phylogeny relaxes this assumption by allowing characters to mutate a constant number of times. We show that if the number of additional mutations required by the near-perfect phylogeny is bounded by q, then we can reconstruct the optimal near-perfect phylogenetic tree in time 2[superscript O](q2)nm2 where n is the number of taxa and m is the number of characters. This is a significant improvement over the previous best result of nm[superscript O(q)]2[superscript O(q2r2)] where r is the number of states per character (2 for binary). This improvement could lead to the first practical phylogenetic tree reconstruction algorithm that is both computationally feasible and biologically meaningful. We finally outline a method to improve the bound to q[superscript O(q)]nm2."

Computational Frameworks for Indel-aware Evolutionary Analysis Using Large-scale Genomic Sequence Data

Computational Frameworks for Indel-aware Evolutionary Analysis Using Large-scale Genomic Sequence Data PDF Author: Wei Wang
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 167

Get Book Here

Book Description
With the development of sequencing techniques, genetic sequencing data has been extensively used in evolutionary studies. The phylogenetic reconstruction problem, which is the reconstruction of evolutionary history from biomolecular sequences, is a fundamental problem. The evolutionary relationship between organisms is often represented by phylogeny, which is a tree or network representation. The most widely-used approach for reconstructing phylogenies from sequencing data involves two phases: multiple sequence alignment and phylogenetic reconstruction from the aligned sequences. As the amount of biomolecular sequence data increases, it has become a major challenge to develop efficient and accurate computational methods for phylogenetic analyses of large-scale sequencing data. Due to the complexity of the phylogenetic reconstruction problem in modern phylogenetic studies, the traditional sequence-based phylogenetic analysis methods involve many over-simplified assumptions. In this thesis, we describe our contribution in relaxing some of these over-simplified assumptions in the phylogenetic analysis.Insertion and deletion events, referred to as indels, carry much phylogenetic information but are often ignored in the reconstruction process of phylogenies. We take into account the indel uncertainties in multiple phylogenetic analyses by applying resampling and re-estimation. Another over-simplified assumption that we contributed to is adopted by many commonly used non-parametric algorithms for the resampling of biomolecular sequences, all sites in an MSA are evolved independently and identically distributed (i.i.d). Many evolution events, such as recombination and hybridization, may produce intra-sequence and functional dependence in biomolecular sequences that violate this assumption. We introduce SERES, a resampling algorithm for biomolecular sequences that can produce resampled replicates that preserve the intra-sequence dependence. We describe the application of the SERES resampling and re-estimation approach to two classical problems: the multiple sequence alignment support estimation and recombination-aware local genealogical inference. We show that these two statistical inference problems greatly benefit from the indel-aware resampling and re-estimation approach and the reservation of intra-sequence dependence.A major drawback of SERES is that it requires parameters to ensure the synchronization of random walks on unaligned sequences. We introduce RAWR, a non-parametric resampling method designed for phylogenetic tree support estimation that does not require extra parameters. We show that the RAWR-based resampling and re-estimation method produces comparable or typically better performance than the traditional bootstrap approach on the phylogenetic tree support estimation problem.We further relax the commonly used assumption of phylogeny. Evolutionary history is usually considered as a tree structure. Evolutionary events that cause reticulated gene flow are ignored. Previous studies show that alignment uncertainty greatly impacts downstream tree inference and learning. However, there is little discussion about the impact of MSA uncertainties on the phylogenetic network reconstruction. We show evidence that the errors introduced in MSA estimation decrease the accuracy of the inferred phylogenetic network, and an indel-aware reconstruction method is needed for phylogenetic network analysis.In this dissertation, we introduce our contribution to phylogenetic estimation using biomolecular sequence data involving complex evolutionary histories, such as sequence insertion and deletion processes and non-tree-like evolution.