Graphical Models for Peptide Identification of Tandem Mass Spectra

Graphical Models for Peptide Identification of Tandem Mass Spectra PDF Author: John T. Halloran
Publisher:
ISBN:
Category :
Languages : en
Pages : 140

Get Book Here

Book Description
Graphical models (GMs) provide a flexible framework for modeling phenomena. In the past few decades, GMs have become indispensable tools for machine learning and computational biology. They afford a wide range of modeling granularity, from restricting only exact events of interest to occur to allowing all possible events in a phenomenon’s event space. For all such modeling considerations, GMs afford efficient algorithms to perform inference over the probabilistic quantities of interest. In this thesis, we show how GMs may be leveraged to improve both identification accuracy and search runtime of tandem mass (MS/MS) spectra. For the majority of existing MS/MS scoring algorithms, we give equivalent GMs and show how search time may be algorithmically improved. We present GMs for posterior based (sum-product) and max-product inference which offer state-of-the-art performance and, most importantly, are amenable to efficient parameter estimation. Furthermore, we show how a GM which generatively models the stochastic process by which peptides produce MS/MS spectra may be utilized to calculate features for improved classification between correctly and incorrectly identified spectra, leading to significantly improved identification accuracy.

Graphical Models for Peptide Identification of Tandem Mass Spectra

Graphical Models for Peptide Identification of Tandem Mass Spectra PDF Author: John T. Halloran
Publisher:
ISBN:
Category :
Languages : en
Pages : 140

Get Book Here

Book Description
Graphical models (GMs) provide a flexible framework for modeling phenomena. In the past few decades, GMs have become indispensable tools for machine learning and computational biology. They afford a wide range of modeling granularity, from restricting only exact events of interest to occur to allowing all possible events in a phenomenon’s event space. For all such modeling considerations, GMs afford efficient algorithms to perform inference over the probabilistic quantities of interest. In this thesis, we show how GMs may be leveraged to improve both identification accuracy and search runtime of tandem mass (MS/MS) spectra. For the majority of existing MS/MS scoring algorithms, we give equivalent GMs and show how search time may be algorithmically improved. We present GMs for posterior based (sum-product) and max-product inference which offer state-of-the-art performance and, most importantly, are amenable to efficient parameter estimation. Furthermore, we show how a GM which generatively models the stochastic process by which peptides produce MS/MS spectra may be utilized to calculate features for improved classification between correctly and incorrectly identified spectra, leading to significantly improved identification accuracy.

Algorithms for Tandem Mass Spectrometry-based Proteomics

Algorithms for Tandem Mass Spectrometry-based Proteomics PDF Author: Ari Michael Frank
Publisher:
ISBN:
Category :
Languages : en
Pages : 205

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has emerged as the leading technology for high-throughput proteomics analysis, making it possible to rapidly identify and characterize thousands of different proteins in complex biological samples. In recent years we have witnessed a dramatic increase in the capability to acquire proteomics MS/MS data. To avoid computational bottlenecks, this growth in acquisition power must be accompanied by a comparable improvement in analysis capabilities. In this dissertation we present several algorithms we developed to meet some of the major computational challenges that have arisen in MS/MS analysis. Throughout our work we continually address two (sometimes overlapping) problems: how to make MS/MS-based sequence identifications more accurate, and how to make the identification process work much faster. Much of the work we present revolves around algorithms for de novo sequencing of peptides, which aims to discover the amino acid sequence of protein digests (peptides), solely from their experimental mass spectrum. We start off by describing a new scoring model which is used in our de novo sequencing algorithm called PepNovo. Our scoring scheme is based on a graphical model decomposition that describes many of the conditions that determine the intensities of fragment ions observed in mass spectra, such as dependencies between related fragment ions and the influence of the amino acids adjacent to the cleavage site. Besides predicting whole peptide sequences, one of the most useful applications of de novo algorithms is to generate short sequence tags for the purpose of database filtration. We demonstrate how using these tags speeds up database searches by two orders of magnitude compared to conventional methods. We extend the use of tag filtration and show that with high-resolution data, our de novo sequencing is accurate enough to enable extremely rapid identification via direct hash lookup of peptide sequences. The vast amount of MS/MS data that has become available has made it possible to use advanced data-driven machine learning methods to devise more acute algorithms. We describe a new scoring function for peptide-spectrum matches that uses the RankBoost ranking algorithm to learn and model the influences of the many intricate processes that occur during peptide fragmentation. Our method's superior discriminatory power boosts PepNovo's performance beyond the current state-of-the-art de novo sequencing algorithms. Our score also greatly improves the performance of database search programs, significantly increasing both their speed and sensitivity. When we applied our method to the challenging task of a proteogenomic search against a six-frame translation of the human genome, we were able to significantly increase the number of peptide identifications compared to current techniques by 60\%. To help speed up MS/MS analysis, we developed a clustering algorithm that exploits the redundancy that is inherent in large mass spectrometry datasets (these often contain hundreds and even thousands of spectra of the same peptide). When applied to large MS/MS datasets on the order of ten million spectra, our clustering algorithm reduces the number of spectra by an order of magnitude, without losing peptide identifications. Finally, we touch upon sequencing of intact proteins (``top-down'' analysis), which from a computational perspective, is only in its infancy -- very few algorithms have been developed for analysis of this type of data. We developed MS-TopDown, which uses the Spectral Alignment algorithm to characterize protein forms (i.e., determine the modification/mutation sites). Our algorithm can handle heavily modified proteins and can also distinguish between several isobaric protein forms present in the same spectrum.

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers PDF Author: Kuang-Ying Hsi
Publisher:
ISBN:
Category :
Languages : en
Pages : 46

Get Book Here

Book Description
Tandem mass spectrometry (MS2) is widely used for peptide and protein identification. One of the most fundamental problems for peptide identification in MS2 is to score peptide annotations against the spectrum which is produced by the peptide. In this thesis, a Bayesian network model is proposed for scoring peptides from Q-TOF mass spectrometers. The research is based on the Bayesian network probabilistic methodology used by InsPecT software, which exploits a hybrid strategy of both database search and de novo algorithms for peptide identification. Initially we focused on the connections of InsPecT scoring model without any changes of nodes. We attempted to determine the connections between nodes for Q-TOF by their dependencies. In order to prove that we need the complete set of nodes as the original InsPecT scoring model, we reduced the number of nodes and surprisingly caused significant improvement in peptide identification performance. The 18-node model was reduced to 10-node models for both charge 2 and charge 3 ions, and we obtained the percentage gain in spectra identification 37.51% for charge 2 and 57.68% for charge 3 ions compared to the InsPecT software 2006.10.20 version. The simplified model also leads to computation time reduction. Currently InsPecT does not perform as well as Mascot on Q-TOF data. Reason for that may be that InsPecT was originally trained for LTQ data and in this thesis we only focused our improvement on the InsPecT scoring stage. Deficiencies may occur in the initial tagging and final calculation of the score. Further research may do an exhaustive combination of fragment ions to derive a set of most discriminative and informative ions.

Modern Proteomics – Sample Preparation, Analysis and Practical Applications

Modern Proteomics – Sample Preparation, Analysis and Practical Applications PDF Author: Hamid Mirzaei
Publisher: Springer
ISBN: 3319414488
Category : Science
Languages : en
Pages : 525

Get Book Here

Book Description
This volume serves as a proteomics reference manual, describing experimental design and execution. The book also shows a large number of examples as to what can be achieved using proteomics techniques. As a relatively young area of scientific research, the breadth and depth of the current state of the art in proteomics might not be obvious to all potential users. There are various books and review articles that cover certain aspects of proteomics but they often lack technical details. Subject specific literature also lacks the broad overviews that are needed to design an experiment in which all steps are compatible and coherent. The objective of this book was to create a proteomics manual to provide scientists who are not experts in the field with an overview of: 1. The types of samples can be analyzed by mass spectrometry for proteomics analysis. 2. Ways to convert biological or ecological samples to analytes ready for mass spectral analysis. 3. Ways to reduce the complexity of the proteome to achieve better coverage of the constituent proteins. 4. How various mass spectrometers work and different ways they can be used for proteomics analysis 5. The various platforms that are available for proteomics data analysis 6. The various applications of proteomics technologies in biological and medical sciences This book should appeal to anyone with an interest in proteomics technologies, proteomics related bioinformatics and proteomics data generation and interpretation. With the broad setup and chapters written by experts in the field, there is information that is valuable for students as well as for researchers who are looking for a hands on introduction into the strengths, weaknesses and opportunities of proteomics.

Algorithms for Peptide Identification by Tandem Mass Spectrometry

Algorithms for Peptide Identification by Tandem Mass Spectrometry PDF Author: Franz Roos
Publisher:
ISBN:
Category :
Languages : en
Pages : 144

Get Book Here

Book Description


Algorithms for Peptide Identification from Mixture Tandem Mass Spectra

Algorithms for Peptide Identification from Mixture Tandem Mass Spectra PDF Author: Yi Liu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired spectra come from a single precursor, thus they are not suitable for the identification of mixture spectra. In this research, we focused on developing effective algorithms for the purpose of interpreting mixture tandem mass spectra, and our research work is mainly comprised of two components: de novo sequencing of mixture spectra and mixture spectra identification by database search. For the de novo sequencing approach, firstly we formulated the mixture spectra de novo sequencing problem mathematically, and proposed a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra datasets to verify the efficiency of the algorithm described in the research. For the database search identification, we proposed an approach for matching mixture tandem mass spectra with a pair of peptide sequences acquired from the protein sequence database by incorporating a special de novo assisted filtration strategy. Besides the filtration strategy, we also introduced in the research a method to give an reasonable estimation of the mixture coefficient which represents the relative abundance level of the co-sequenced precursors. The preliminary experimental results demonstrated the efficiency of the integrated filtration strategy and mixture coefficient estimating method in reducing examination space and also verified the effectiveness of the proposed matching scheme.

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Algorithms for Peptide Identification Via Tandem Mass Spectrometry

Algorithms for Peptide Identification Via Tandem Mass Spectrometry PDF Author: Thomas Tschager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Identification of Proteins by Tandem Mass Spectrometry Using Improved Peptide Fragmentation Models

Identification of Proteins by Tandem Mass Spectrometry Using Improved Peptide Fragmentation Models PDF Author: Frédéric Schütz
Publisher:
ISBN:
Category : Mass spectrometry
Languages : en
Pages : 180

Get Book Here

Book Description


Ultrafast and Real-time Peptide Identification from Tandem Mass Spectra

Ultrafast and Real-time Peptide Identification from Tandem Mass Spectra PDF Author: Benjamin J. Diament
Publisher:
ISBN:
Category : Peptides
Languages : en
Pages : 83

Get Book Here

Book Description