Algorithms for Peptide Identification from Mixture Tandem Mass Spectra

Algorithms for Peptide Identification from Mixture Tandem Mass Spectra PDF Author: Yi Liu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired spectra come from a single precursor, thus they are not suitable for the identification of mixture spectra. In this research, we focused on developing effective algorithms for the purpose of interpreting mixture tandem mass spectra, and our research work is mainly comprised of two components: de novo sequencing of mixture spectra and mixture spectra identification by database search. For the de novo sequencing approach, firstly we formulated the mixture spectra de novo sequencing problem mathematically, and proposed a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra datasets to verify the efficiency of the algorithm described in the research. For the database search identification, we proposed an approach for matching mixture tandem mass spectra with a pair of peptide sequences acquired from the protein sequence database by incorporating a special de novo assisted filtration strategy. Besides the filtration strategy, we also introduced in the research a method to give an reasonable estimation of the mixture coefficient which represents the relative abundance level of the co-sequenced precursors. The preliminary experimental results demonstrated the efficiency of the integrated filtration strategy and mixture coefficient estimating method in reducing examination space and also verified the effectiveness of the proposed matching scheme.

Algorithms for Peptide Identification from Mixture Tandem Mass Spectra

Algorithms for Peptide Identification from Mixture Tandem Mass Spectra PDF Author: Yi Liu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired spectra come from a single precursor, thus they are not suitable for the identification of mixture spectra. In this research, we focused on developing effective algorithms for the purpose of interpreting mixture tandem mass spectra, and our research work is mainly comprised of two components: de novo sequencing of mixture spectra and mixture spectra identification by database search. For the de novo sequencing approach, firstly we formulated the mixture spectra de novo sequencing problem mathematically, and proposed a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra datasets to verify the efficiency of the algorithm described in the research. For the database search identification, we proposed an approach for matching mixture tandem mass spectra with a pair of peptide sequences acquired from the protein sequence database by incorporating a special de novo assisted filtration strategy. Besides the filtration strategy, we also introduced in the research a method to give an reasonable estimation of the mixture coefficient which represents the relative abundance level of the co-sequenced precursors. The preliminary experimental results demonstrated the efficiency of the integrated filtration strategy and mixture coefficient estimating method in reducing examination space and also verified the effectiveness of the proposed matching scheme.

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide PDF Author: Jian Wang
Publisher:
ISBN: 9781303217050
Category :
Languages : en
Pages : 124

Get Book Here

Book Description
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In several areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass (MS/MS) spectra. While there are numerous situations where two or more peptides are co-fragmented in the same MS/MS spectrum, nearly all mainstream computational approaches still make the ubiquitous assumption that each MS/MS spectrum comes from only one peptide. In this thesis we addressed problems in three emerging areas where computational tools that relax the above assumption are crucial for the success application of these approaches on a large-scale. In the first chapter we describe algorithms for the identification of mixture spectra that are from more than one co-eluting peptide precursors. The ability to interpret mixture spectra not only improves peptide identification in traditional data-dependent-acquisition (DDA) workflows but is also crucial for the success application of emerging data-independent-acquisition (DIA) techniques that have the potential to greatly improve the throughput of peptide identification. In chapter two, we address the problem of identification of peptides with complex post-translational modification (PTM). Detection of PTMs is important to understand the functional dynamics of proteins. Complex PTMs resulted from the conjugation of another macromolecule onto the substrate protein. The resultant modified peptides not only generate spectrum that contains a mixture of fragment ions from both the PTM and the substrate peptide but they also display substantially different fragmentation patterns as compared to conventional, unmodified peptides. We describe a hybrid experimental and computational approach to build search tools that capture the specific fragmentation patterns of modified peptides. Finally in chapter three we address the problem of identification of linked peptides. Linked peptides are two peptides that are covalently linked together. The generation and identification of linked peptides has recently been demonstrated to be a versatile tool to study protein-protein interactions and protein structures, however the identification of linked peptides face many challenges. We integrate lessons learned in the previous chapters to build an efficient and sensitive tool to identify linked peptides from MS/MS spectra.

Algorithms for Peptide Identification by Tandem Mass Spectrometry

Algorithms for Peptide Identification by Tandem Mass Spectrometry PDF Author: Franz Roos
Publisher:
ISBN:
Category :
Languages : en
Pages : 144

Get Book Here

Book Description


Algorithms for Peptide Identification Via Tandem Mass Spectrometry

Algorithms for Peptide Identification Via Tandem Mass Spectrometry PDF Author: Thomas Tschager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


High-Performance Algorithms for Mass Spectrometry-Based Omics

High-Performance Algorithms for Mass Spectrometry-Based Omics PDF Author: Fahad Saeed
Publisher: Springer Nature
ISBN: 3031019601
Category : Science
Languages : en
Pages : 146

Get Book Here

Book Description
To date, processing of high-throughput Mass Spectrometry (MS) data is accomplished using serial algorithms. Developing new methods to process MS data is an active area of research but there is no single strategy that focuses on scalability of MS based methods. Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. In the recent years the technology has rapidly evolved and is now capable of generating increasingly large (multiple tera-bytes per experiment) and complex (multiple species/microbiome/high-dimensional) data sets. This rapid advance in MS instrumentation must be matched by equally fast and rapid evolution of scalable methods developed for analysis of these complex data sets. Ideally, the new methods should leverage the rich heterogeneous computational resources available in a ubiquitous fashion in the form of multicore, manycore, CPU-GPU, CPU-FPGA, and IntelPhi architectures. The absence of these high-performance computing algorithms now hinders scientific advancements for mass spectrometry research. In this book we illustrate the need for high-performance computing algorithms for MS based proteomics, and proteogenomics and showcase our progress in developing these high-performance algorithms.

Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry

Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry PDF Author: Lin He
Publisher:
ISBN:
Category :
Languages : en
Pages : 136

Get Book Here

Book Description
The emergence of tandem mass spectrometry (MS/MS) technology has significantly accelerated protein identification and quantification in proteomics. It enables high-throughput analysis of proteins and their quantities in a complex protein mixture. A mass spectrometer can easily and rapidly generate large volumes of mass spectral data for a biological sample. This bulk of data makes manual interpretation impossible and has also brought numerous challenges in automated data analysis. Algorithmic solutions have been proposed and provide indispensable analytical support in current proteomic experiments. However, new algorithms are still needed to either improve result accuracy or provide additional data analysis capabilities for both protein identification and quantification. Accurate identification of proteins in a sample is the preliminary requirement of a proteomic study. In many cases, a mass spectrum cannot provide complete information to identify the peptide without ambiguity because of the inefficiency of the peptide fragmentation technique and the prevalent existence of noise. We propose ADEPTS to this problem using the complementary information provided in different types of mass spectra. Meanwhile, the occurrence of posttranslational modifications (PTMs) on proteins is another major issue that prevents the interpretation of a large portion of spectra. Using current software tools, users have to specify possible PTMs in advance. However, the number of possible PTMs has to be limited since specifying more PTMs to the software leads to a longer running time and lower result accuracy. Thus, we develop DeNovoPTM and PeaksPTM to provide efficient and accurate solutions. Glycosylation is one of the most frequently observed PTMs in proteomics. It plays important roles in many disease processes and thus has attracted growing research interest. However, lack of algorithms that can identify intact glycopeptides has become the major obstacle that hinders glycoprotein studies. We propose a novel algorithm, GlycoMaster DB, to fulfil this urgent requirement. Additional research is presented on protein quantification, which studies the changes of protein quantity by comparing two or more mass spectral datasets. A crucial problem in the quantification is to correct the retention time distortions between different datasets. Heuristic solutions from previous research have been used in practice but none of them has yet claimed a clear optimization goal. To address this issue, we propose a combinatorial model and practical algorithms for this problem.

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry PDF Author: Han, Xi
Publisher:
ISBN:
Category :
Languages : en
Pages : 69

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from protein mixtures in the field of proteomics. However, only about 30% to 40% of current MS/MS spectra can be identified, while many of them remain unassigned, even though they are of reasonable quality. The ubiquitous presence of post-translational modifications (PTMs) is one of the reasons for current low spectral identification rate. In order to identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this thesis, we describe a new algorithm for identifying modified peptides without requiring users to specify the possible modifications before the search routine; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. A software tool, PeaksPTM, has been developed and it has already achieved a stronger performance than competitive tools for unrestricted identification of post-translationally modified peptides. Another important reason for the failure of the search tools is the inaccurate mass or charge state measurement of the precursor peptide ion. In this thesis, we study the precursor mono-isotopic mass and charge determination problem, and propose an algorithm to correct precursor ion mass error by assessing the isotopic features in its parent MS spectrum. The algorithm has been tested on two annotated data sets and achieved almost 100 percent accuracy. Furthermore, we have studied a more complicated problem, the MS/MS preprocessing problem, and propose a spectrum deconvolution algorithm. Experiments were provided to compare its performance with other existing software.

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Graphical Models for Peptide Identification of Tandem Mass Spectra

Graphical Models for Peptide Identification of Tandem Mass Spectra PDF Author: John T. Halloran
Publisher:
ISBN:
Category :
Languages : en
Pages : 140

Get Book Here

Book Description
Graphical models (GMs) provide a flexible framework for modeling phenomena. In the past few decades, GMs have become indispensable tools for machine learning and computational biology. They afford a wide range of modeling granularity, from restricting only exact events of interest to occur to allowing all possible events in a phenomenon’s event space. For all such modeling considerations, GMs afford efficient algorithms to perform inference over the probabilistic quantities of interest. In this thesis, we show how GMs may be leveraged to improve both identification accuracy and search runtime of tandem mass (MS/MS) spectra. For the majority of existing MS/MS scoring algorithms, we give equivalent GMs and show how search time may be algorithmically improved. We present GMs for posterior based (sum-product) and max-product inference which offer state-of-the-art performance and, most importantly, are amenable to efficient parameter estimation. Furthermore, we show how a GM which generatively models the stochastic process by which peptides produce MS/MS spectra may be utilized to calculate features for improved classification between correctly and incorrectly identified spectra, leading to significantly improved identification accuracy.

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers PDF Author: Kuang-Ying Hsi
Publisher:
ISBN:
Category :
Languages : en
Pages : 46

Get Book Here

Book Description
Tandem mass spectrometry (MS2) is widely used for peptide and protein identification. One of the most fundamental problems for peptide identification in MS2 is to score peptide annotations against the spectrum which is produced by the peptide. In this thesis, a Bayesian network model is proposed for scoring peptides from Q-TOF mass spectrometers. The research is based on the Bayesian network probabilistic methodology used by InsPecT software, which exploits a hybrid strategy of both database search and de novo algorithms for peptide identification. Initially we focused on the connections of InsPecT scoring model without any changes of nodes. We attempted to determine the connections between nodes for Q-TOF by their dependencies. In order to prove that we need the complete set of nodes as the original InsPecT scoring model, we reduced the number of nodes and surprisingly caused significant improvement in peptide identification performance. The 18-node model was reduced to 10-node models for both charge 2 and charge 3 ions, and we obtained the percentage gain in spectra identification 37.51% for charge 2 and 57.68% for charge 3 ions compared to the InsPecT software 2006.10.20 version. The simplified model also leads to computation time reduction. Currently InsPecT does not perform as well as Mascot on Q-TOF data. Reason for that may be that InsPecT was originally trained for LTQ data and in this thesis we only focused our improvement on the InsPecT scoring stage. Deficiencies may occur in the initial tagging and final calculation of the score. Further research may do an exhaustive combination of fragment ions to derive a set of most discriminative and informative ions.