Probabilistic Algorithms for Predicting Functional Regions in Protein-coding Genes

Probabilistic Algorithms for Predicting Functional Regions in Protein-coding Genes PDF Author: Itay Mayrose
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description

Probabilistic Algorithms for Predicting Functional Regions in Protein-coding Genes

Probabilistic Algorithms for Predicting Functional Regions in Protein-coding Genes PDF Author: Itay Mayrose
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Computational Prediction of Functional Elements Through Comparative Genomics.

Computational Prediction of Functional Elements Through Comparative Genomics. PDF Author: Xu Ling
Publisher:
ISBN: 9781243753892
Category :
Languages : en
Pages : 106

Get Book Here

Book Description
Understanding the evolution and organization of the genomic functional elements is one of the most important goals of genomic studies. The complexity of the functional information encoded in the genome sequences and the variabilities of the manners of encoding the information make it a very challenging task. Nucleotides mutations and genome-wide re-arrangements bring additional great challenges in identification and understanding of the functional elements in the genome. On the other hand, due to natural selection, functional sequences tend to evolve at a slower rate than non-functional sequences. Therefore, the conservation pattern across species often indicates where functional sequences are located. With the increasing number of species being sequenced, comparative genomes, which compares the sequences from multiple species at varying evolutionary distances, has now merged as a very powerful approach for identifying variety types of functional elements, such as protein coding genes, transcriptional regulatory sequences, and non-coding RNA genes. This dissertation research has been focused on two grand challenges of genomics: (i) to decode cis-regulatory modules (CRMs), non-coding DNA sequences controlling gene expression; and (ii) to discover gene groups that are functionally related. For both lines of work, the key idea is to leverage the power of comparative genomics in decoding the genomic information. The first part of this thesis developed a probabilistic framework for CRM prediction. This framework is based on a probabilistic model of CRM evolution, which captures the content feature of regulatory sequences as well as their dynamic process of evolution. This model advances the previous models by dealing with the inherent uncertainties of transcription factor binding site (TFBS) annotations in a probabilistic framework, as partially conserved binding site has been recognized as an important aspect of regulatory sequence evolution we explicitly model the two stochastic process of loss of existing TFBSs and TFBS gain from background nucleotides, to leverage the power of comparative genomics for CRM prediction, while at the same time utilize the information of this lineage-specific pattern. The second part of this thesis focuses on discovering functionally related gene groups. Understanding how genes are organized in the genomes and what information is encoded in genomic contexts is one of the fundamental problems in genomics. During evolution, the gene order is generally not well conserved because of the rapid rearrangement events that reshuffle genomes. On the other hand, functionally related genes may be constrained to remain close to each other due to natural selection, forming so called conserved gene clusters. Conservation of spatial organization of genes provides an important source of information that is orthogonal to primary sequences of genes and thus could be exploited to supplement our existing genomic analysis tools. In this thesis, we developed a highly efficient algorithm to discover conserved gene clusters across multiple genomes. These gene clusters are likely under some evolutionary constraint and indicate functional relationship among the genes within a cluster. Our algorithm advances existing work by allowing genes in the clusters to appear in different orders and at the same time making the computation orders of magnitude faster. This allows us to detect conserved gene clusters under flexible evolutionary constraints in a large number of genomes. In addition, we developed a statistical evaluation method, which incorporates the evolutionary relationship among genomes, a key aspect that has been missing in...

Discovering Protein Functional Regions and Protein-protein Interaction Using Co-occurring Aligned Pattern Clusters

Discovering Protein Functional Regions and Protein-protein Interaction Using Co-occurring Aligned Pattern Clusters PDF Author: Sanderz Fung
Publisher:
ISBN:
Category :
Languages : en
Pages : 52

Get Book Here

Book Description
Bioinformatics is a rapidly expanding field of research due to multiple recent advancements: 1) the advent of machine intelligence, 2) the increase of computing power, 3) our better understanding of the underlying biomolecular mechanisms, and 4) the drastic reduction of biosequencing cost and time. Since wet laboratory approaches to analysing the protein sequencing is still labour intensive and time consuming, more cost-effective computational approaches for analyzing protein sequences and their biochemical interactions are crucial. This is especially true when we encounter a large collection of protein sequences. Aligned Pattern CLustering (APCL), an algorithm which combines machine intelligence methodologies such as pattern recognition, pattern discovery, pattern clustering and alignment, formulated by my research group and myself, is one such technique. APCL discovers, prunes, and clusters aligned statistically significant patterns to assemble a related, or specifically, a homologous group of patterns in the form of an Aligned Pattern Cluster (APC). The APC obtained is found to correspond to statistically and functionally significant association patterns, which corresponds as conserved regions, such as binding segments within and between protein sequences as well as between Protein Transcription Factor (TF) and DNA Transcription Factor Binding Sites (TFBS) in many of our empirical experiments. While several known algorithms also exist to find functionally conserved segments in biosequences, they are less flexible and require more parameters than what APCL requires. Hence, APCL is a powerful tool to analyze biosequences. Because of its effectiveness, the usefulness of APCL is further expanded from the assist of discovering and analyzing functional regions of protein sequences to the exploration of co-occurrence of patterns on the same sequences or on interacting patterns between sequences from the discovered APCs. Two new algorithms are introduced and reported in this thesis in the exploration of 1) APCs containing patterns residing within the same biosequences and 2) APCs containing patterns residing between interacting biosequences. The first algorithm attempts to cluster APCs from APCs that share patterns on the same biosequences. It uses a co-occurrence score between APCs in a co-occurrence APC pair (two APCs containing co-occurrence patterns) to account for the proportion of biosequences of co-occurrence patterns they share against the total number of sequences containing them. Using this score as a similarity measure (or more precisely, as a co-occurring measure), we devise a Co-occurrence APC Clustering Algorithm to cluster APCs obtained from a collection of related biosequences into a Co-Occurrence Cluster of APCs abbreviated by cAPC. It is then analyzed and verified to see whether or not there are essential biological functions associating with the APCs within that cluster. Cytochrome c and ubiquitin families were analyzed in depth, and it was validated that members in the same cAPC do cover the functional regions that have essential cooperative biological functions. The second algorithm takes advantage of the effectiveness of APCL to create a protein-protein interaction (PPI) identification and prediction algorithm. PPI prediction is a hot research problem in bioinformatics and proteomic. A good number of algorithms exist. The state of the art algorithm is one which could achieve high success rate in prediction performance, but provides results that are difficult to interpret. The research in this thesis tries to overcome this hurdle. This second algorithm uses an APC-PPI score between two APCs to account for the proportion of patterns residing on two different protein sequences. This score measures how often patterns in both APCs co-occur in the sequence data of two known interacting proteins. The scores are then used to construct feature vectors to first train a learning model from the known PPI data and later used to predict the possible PPI between a protein pair. The algorithm performance was comparable to the state of the art algorithms, but provided results that are interpretable. The results from both algorithms built upon the extension of APCL in finding co-occurring patterns via co-occurrence of APCs are proved to be effective and useful since its performance in finding APCs is fast and effective. The first algorithm discovered biological insights, supported by biological literature, which are typically unable to be discovered solely through the analysis of biosequences. The second algorithm succeeded in providing accurate and descriptive PPI predictions. Hence, these two algorithms are useful in the analysis and prediction of proteins. In addition, through continued research and development to the second algorithm, it will be a powerful tool for the drug industry, as it can help find new PPI, an important step in developing new drugs for different drug targets.

Predicting Functional Regions in Proteins Using a Neural Network

Predicting Functional Regions in Proteins Using a Neural Network PDF Author: Badr Alshomrani
Publisher:
ISBN:
Category : Amino acid sequence
Languages : en
Pages : 104

Get Book Here

Book Description


Functional Plant Genomics

Functional Plant Genomics PDF Author: J F Morot-Gaudry
Publisher: CRC Press
ISBN: 1000610969
Category : Science
Languages : en
Pages : 739

Get Book Here

Book Description
The openings offered by functional genomics reconciles organism biology and molecular biology, in order to define an integrative biology that should allow new insights about how a phenotype is built up from a genotype in interaction with its environment. This book covers a wide area of concepts and methods in genomics. This range from international

Biological Sequence Analysis

Biological Sequence Analysis PDF Author: Richard Durbin
Publisher: Cambridge University Press
ISBN: 113945739X
Category : Science
Languages : en
Pages : 372

Get Book Here

Book Description
Probabilistic models are becoming increasingly important in analysing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analysing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it aims to be accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time present the state-of-the-art in this new and highly important field.

Probabilistic Integration of Heterogeneous, Contextual, and Cross-species Genome-wide Data for Protein Function Prediction

Probabilistic Integration of Heterogeneous, Contextual, and Cross-species Genome-wide Data for Protein Function Prediction PDF Author: Naoki Nariai
Publisher:
ISBN:
Category :
Languages : en
Pages : 200

Get Book Here

Book Description
Abstract: Completed genome sequences from many organisms have revealed many genes with no known function. A critical challenge is the development of methods that will aid in the discovery of the molecular functions of the newly discovered genes, while identifying the biological processes in which these genes participate. Current sequence-based methods frequently fail to annotate gene function accurately. New computational approaches combining genomic, transcriptional and proteomic data generated from high-throughput technologies offer potential routes toward predictions of increased accuracy and greater coverage of unknowns. In this thesis, we describe and evaluate several probabilistic methods for protein function prediction that integrate heterogeneous genome-wide data, such as protein-protein interaction (PPI) data, mRNA expression data, protein domain, and localization information under a Bayesian framework. In a cross validation study in yeast, with the goal of predicting the Gene Ontology "biological process" terms, our integrated method increases recall by 18% over methods that only use PPI data, at 50% precision. We compared prediction accuracies in five different model organisms (human, mouse, fly, worm and yeast). Of the various types of genome-wide data incorporated, we found that PPI data contributes most significantly to the improved precision of predictions in yeast. We also develop a context-specific approach for protein function prediction in order to capture dependencies among the various types of biological information listed above. We found that context-specific methods improve prediction precision in some cases, but can also degrade performance for some predictions. Finally, we developed a method to integrate PPI networks between different species through homology mapping. We predict genes that participate in the insulin signaling pathway. This pathway is highly conserved between human and worm, and of profound biological and medical interest given its roles in diabetes and aging. In a cross validation study, our method which derives PPI relationships from both organisms significantly improved prediction performance over a method that only uses PPI data from either human or worm. We produce a large number of predictions in which a number of cases have reasonable literature support.

Evolutionary and Structural Signatures of Protein-coding Function

Evolutionary and Structural Signatures of Protein-coding Function PDF Author: Maxim Y. Wolf (Ph. D.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 90

Get Book Here

Book Description
In this thesis I observe evolutionary signatures in coding regions to: (1) understand the sources of highly mutable coding regions in mammals; (2) to elucidate a new candidate function for a stop codon readthrough candidate gene, BRI3BP; and (3) to show how rapid sequence-based structure approximations can help predict the structural impact of amino-acid changes. (1) First, I searched for deviations from the evolutionary signatures of coding regions to recognize synonymous acceleration elements (SAEs) in protein coding genes. I showed that these are driven by an increased mutation rate, which persists in the human lineage, in otherwise evolutionarily-constrained protein-coding regions, providing an important resource to better characterize protein-coding constraint in mammals and within humans. (2) Second, I combined evolutionary signatures at the protein-coding and protein-folding level to characterize the functional implication of stop-codon readthrough in BRI3BP. I showed that this readthrough region has conserved spaced hydrophobic residues that pattern match to the -terminal helix forming a coiled-coil-like domain. This change alters BRI3BP function from pro-growth to pro-apoptotic, similarly to VEGF-A. This suggests that readthrough-triggered apoptosis may represent a general mechanism for limiting growth of cells with aberrant ribosomal termination. (3) Third, I used rapid protein-structure approximation of burial of residues based on protein sequence to predict the structural impact of amino acid alterations. I show that the prediction can be improved over using exclusively the hydrophobicity change of the residue. Overall my work demonstrates how evolutionary and structural signatures can be used to predict highly mutational gene regions, readthrough function and structural impact of mutation.

Computational Methods For Understanding Bacterial And Archaeal Genomes

Computational Methods For Understanding Bacterial And Archaeal Genomes PDF Author: Ying Xu
Publisher: World Scientific
ISBN: 1908979011
Category : Science
Languages : en
Pages : 494

Get Book Here

Book Description
Over 500 prokaryotic genomes have been sequenced to date, and thousands more have been planned for the next few years. While these genomic sequence data provide unprecedented opportunities for biologists to study the world of prokaryotes, they also raise extremely challenging issues such as how to decode the rich information encoded in these genomes. This comprehensive volume includes a collection of cohesively written chapters on prokaryotic genomes, their organization and evolution, the information they encode, and the computational approaches needed to derive such information. A comparative view of bacterial and archaeal genomes, and how information is encoded differently in them, is also presented. Combining theoretical discussions and computational techniques, the book serves as a valuable introductory textbook for graduate-level microbial genomics and informatics courses./a

The Handbook of Plant Functional Genomics

The Handbook of Plant Functional Genomics PDF Author: Guenter Kahl
Publisher: John Wiley & Sons
ISBN: 3527622551
Category : Science
Languages : en
Pages : 576

Get Book Here

Book Description
In this incisive, concise overview of this booming field, the editors -- two of the leading figures in the field with a proven track record -- combine their expertise to provide an invaluable reference on the topic. Following a treatment of transcriptome analysis, the book goes on to discuss replacement and mutation analysis, gene silencing and computational analysis. The whole is rounded off with a look at emerging technologies. Each chapter is accompanied by a concise overview, helping readers to quickly identify topics of interest, while important, carefully selected words and concepts are explained in a handy glossary. Equally accessible to both experienced scientists and newcomers to the field.