Genomic Data Mining for the Computational Prediction of Small Non-coding RNA Genes

Genomic Data Mining for the Computational Prediction of Small Non-coding RNA Genes PDF Author: Thao Thanh Thi Tran
Publisher:
ISBN:
Category : Data mining
Languages : en
Pages :

Get Book Here

Book Description
The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.

Genomic Data Mining for the Computational Prediction of Small Non-coding RNA Genes

Genomic Data Mining for the Computational Prediction of Small Non-coding RNA Genes PDF Author: Thao Thanh Thi Tran
Publisher:
ISBN:
Category : Data mining
Languages : en
Pages :

Get Book Here

Book Description
The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.

A Computational Tool for the Prediction of Small Non-coding RNA in Genome Sequences

A Computational Tool for the Prediction of Small Non-coding RNA in Genome Sequences PDF Author: Ning Yu
Publisher:
ISBN:
Category :
Languages : en
Pages : 52

Get Book Here

Book Description
The purpose of researching bacterial gene expression is to control and prevent the diseases which are caused by bacteria. Recently researchers discovered small non-coding RNAs (ncRNA/sRNA) perform a variety of critical regulatory functions in bacteria. The genome-wide searching for sRNAs, especially the computational method, has become an effective way to predict the small non-coding RNAs because sRNAs have the consistent sequence characteristics. This article proposes a hybrid computational approach, HybridRNA, for the prediction of small non-coding RNAs, which integrates three critical techniques, including secondary structural algorithm, thermo-dynamic stability analysis and sequence conservation prediction. Relying on these computational techniques, our approach was used to search for sRNAs in Streptococcus pyogenes which is one of the most important bacteria for human health. This search led five strongest candidates of sRNA to be predicted as the key components of known regulatory pathways in S. pyogens.

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics PDF Author: Sio-Iong Ao
Publisher: Springer Science & Business Media
ISBN: 1402089759
Category : Computers
Languages : en
Pages : 159

Get Book Here

Book Description
Data Mining and Applications in Genomics contains the data mining algorithms and their applications in genomics, with frontier case studies based on the recent and current works at the University of Hong Kong and the Oxford University Computing Laboratory, University of Oxford. It provides a systematic introduction to the use of data mining algorithms as an investigative tool for applications in genomics. Data Mining and Applications in Genomics offers state of the art of tremendous advances in data mining algorithms and applications in genomics and also serves as an excellent reference work for researchers and graduate students working on data mining algorithms and applications in genomics.

Small Non-Coding RNAs

Small Non-Coding RNAs PDF Author: Mathieu Rederstorff
Publisher: Humana
ISBN: 9781493949038
Category : Medical
Languages : en
Pages : 0

Get Book Here

Book Description
This volume contains state-of-the-art methods tackling all aspects of small non-coding RNAs biology. Small Non-Coding RNAs: Methods and Protocols guides readers through customized dedicated protocols and technologies that will be of valuable help to all those willing to contribute deciphering the numerous functions of small non-coding RNAs. Written in the highly successful Methods of Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols and key tips on troubles troubleshooting and avoiding known pitfalls. Instructive and practical, Small Non-Coding RNAs: Methods and Protocols reaches out to biochemists, cellular and molecular biologists already working in the field of RNA biology and to those just starting to study small non-coding RNAs.

Genomes, Browsers and Databases

Genomes, Browsers and Databases PDF Author: Peter Schattner
Publisher: Cambridge University Press
ISBN: 1139472712
Category : Science
Languages : en
Pages : 329

Get Book Here

Book Description
The recent explosive growth of biological data has lead to a rapid increase in the number of molecular biology databases. Held in many different locations and often using varying interfaces and non-standard data formats, integrating and comparing data from these multiple databases can be difficult and time-consuming. This book provides an overview of the key tools currently available for large-scale comparisons of gene sequences and annotations, focusing on the databases and tools from the University of California, Santa Cruz (UCSC), Ensembl, and the National Centre for Biotechnology Information (NCBI). Written specifically for biology and bioinformatics students and researchers, it aims to give an appreciation of the methods by which the browsers and their databases are constructed, enabling readers to determine which tool is the most appropriate for their requirements. Each chapter contains a summary and exercises to aid understanding and promote effective use of these important tools.

Handbook of Machine Learning Applications for Genomics

Handbook of Machine Learning Applications for Genomics PDF Author: Sanjiban Sekhar Roy
Publisher: Springer Nature
ISBN: 9811691584
Category : Technology & Engineering
Languages : en
Pages : 222

Get Book Here

Book Description
Currently, machine learning is playing a pivotal role in the progress of genomics. The applications of machine learning are helping all to understand the emerging trends and the future scope of genomics. This book provides comprehensive coverage of machine learning applications such as DNN, CNN, and RNN, for predicting the sequence of DNA and RNA binding proteins, expression of the gene, and splicing control. In addition, the book addresses the effect of multiomics data analysis of cancers using tensor decomposition, machine learning techniques for protein engineering, CNN applications on genomics, challenges of long noncoding RNAs in human disease diagnosis, and how machine learning can be used as a tool to shape the future of medicine. More importantly, it gives a comparative analysis and validates the outcomes of machine learning methods on genomic data to the functional laboratory tests or by formal clinical assessment. The topics of this book will cater interest to academicians, practitioners working in the field of functional genomics, and machine learning. Also, this book shall guide comprehensively the graduate, postgraduates, and Ph.D. scholars working in these fields.

Computational Methods for the Analysis of Genomic Data and Biological Processes

Computational Methods for the Analysis of Genomic Data and Biological Processes PDF Author: Francisco A. Gómez Vela
Publisher: MDPI
ISBN: 3039437712
Category : Medical
Languages : en
Pages : 222

Get Book Here

Book Description
In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Computational Study of Small Noncoding RNAs and Their Functions

Computational Study of Small Noncoding RNAs and Their Functions PDF Author: Xuefeng Zhou
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 203

Get Book Here

Book Description
Post-transcriptional gene regulation at the RNA level has been recently shown to be more widespread and important than previously assumed. While various regulatory RNA molecules have been reported in animals and plants, two prominent types of regulatory small RNAs are microRNAs (miRNAs) and endogenous short interfering RNAs (siRNAs). Because of their importance, their nature and the difficulties in studying them, research in miRNAs has been an active research topic with many computational challenges. First, computational strategies for miRNA identification have been developed to overcome the technical hurdles for experimental methods based on expression screening. We propose and develop a novel ranking algorithm based on random walks to computationally predict novel miRNAs from genomes, which have a few known miRNAs, may be poorly annotation and even not completely assembled. We also develop meta-feature based classification method to identify miRNAs from high-throughput sequencing data of small RNAs. In addition, we devise a pipeline to analyze natsiRNA in high-throughput sequencing data. Secondly, we formulate the problem of promoter prediction based on multiple instance learning scheme, and propose an effective promoter identification algorithm, called CoV ote. We apply CoV ote to predict microRNA core promoters. We investigate core promoter regions of microRNA genes in Caenorhabditis elegans, Homo sapiens, Arabidopsis thaliana and Oryza sativa, and further analyze sequence motifs in the putative core promoters which may be involved in the transcription of microRNA genes. Furthermore, with characterized promoters of miRNA genes, we apply data mining approaches to model the transcriptome of miRNAs under particular conditions. Finally, by integrating miRNA target genes, we further analyze the miRNA-mediate regulatory networks and computationally identify network motifs. Since miRNAs and their targets can be formulated as a natural bipartite network(graph), we propose and develop a tool to study modules in the miRNA-regulatory network.

Methods for Computational Gene Prediction

Methods for Computational Gene Prediction PDF Author: William H. Majoros
Publisher:
ISBN:
Category : Computers
Languages : en
Pages : 456

Get Book Here

Book Description
A self-contained, rigorous text describing models used to identify genes in genomic DNA sequences.

Non-coding RNA Genes in Eukaryotes Genomes

Non-coding RNA Genes in Eukaryotes Genomes PDF Author: Chun-Long Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 191

Get Book Here

Book Description
It became clear that non-coding RNAs(ncRNA) participate in the control of gene expression at different levels of regulation. However, ncRNA genes are usually not annotated within genomes. Better understanding of genome functioning requires refined computational tools for ncRNA prediction, some are emerging in the nowadays genomic era. I developed a computational system, called snoRMP, to identify the box C/D snoRNAs that play a fundamental role in ribosome biogenesis. I applied it to the rice genome and identified 346 snoRNAs that grouped into 120 paralogous sets, sequence differences of which allowed to find clues about the mechanisms of duplication and evolution of snoRNAs. I also used the snoRMP to screen the genomes of Schizosaccharomyces pombe, Drosophila melanogaster and Chlamydomonas reinhardtii. In addition, I performed an extensive analysis of 415 rRNA and box C/D snoRNA complementary sequences involved in methylation of 124 rRNA sites from fungi, plants and animals. I could define snoRNA-rRNA duplex cores of 9 base pairs, over which single mutations had been severely counter-selected, and double compensatory mutations, retained. The Paramecium tetraurelia genome arose through at least three whole-genome duplications(WGD). In contrast with most genomes having evolved by WGDs that had lost a large fraction of the gene duplicates, the P. tetraurelia genome had not. I used motif-based methods to recover extensive contents of P. tetraurelia RNA genes, and analyzed their evolution in this specific WGD context. At last, I used a combination of comparative sequence analysis and structure predictions to analyze the whole amount of ncDNA and identify 137 ncRNA candidates.