Algorithms for Peptide Identification Via Tandem Mass Spectrometry

Algorithms for Peptide Identification Via Tandem Mass Spectrometry PDF Author: Thomas Tschager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description

Algorithms for Peptide Identification Via Tandem Mass Spectrometry

Algorithms for Peptide Identification Via Tandem Mass Spectrometry PDF Author: Thomas Tschager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Algorithms for Peptide Identification by Tandem Mass Spectrometry

Algorithms for Peptide Identification by Tandem Mass Spectrometry PDF Author: Franz Roos
Publisher:
ISBN:
Category :
Languages : en
Pages : 144

Get Book Here

Book Description


Algorithms for Peptide Identification from Mixture Tandem Mass Spectra

Algorithms for Peptide Identification from Mixture Tandem Mass Spectra PDF Author: Yi Liu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired spectra come from a single precursor, thus they are not suitable for the identification of mixture spectra. In this research, we focused on developing effective algorithms for the purpose of interpreting mixture tandem mass spectra, and our research work is mainly comprised of two components: de novo sequencing of mixture spectra and mixture spectra identification by database search. For the de novo sequencing approach, firstly we formulated the mixture spectra de novo sequencing problem mathematically, and proposed a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra datasets to verify the efficiency of the algorithm described in the research. For the database search identification, we proposed an approach for matching mixture tandem mass spectra with a pair of peptide sequences acquired from the protein sequence database by incorporating a special de novo assisted filtration strategy. Besides the filtration strategy, we also introduced in the research a method to give an reasonable estimation of the mixture coefficient which represents the relative abundance level of the co-sequenced precursors. The preliminary experimental results demonstrated the efficiency of the integrated filtration strategy and mixture coefficient estimating method in reducing examination space and also verified the effectiveness of the proposed matching scheme.

Algorithms for Tandem Mass Spectrometry-based Proteomics

Algorithms for Tandem Mass Spectrometry-based Proteomics PDF Author: Ari Michael Frank
Publisher:
ISBN:
Category :
Languages : en
Pages : 205

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has emerged as the leading technology for high-throughput proteomics analysis, making it possible to rapidly identify and characterize thousands of different proteins in complex biological samples. In recent years we have witnessed a dramatic increase in the capability to acquire proteomics MS/MS data. To avoid computational bottlenecks, this growth in acquisition power must be accompanied by a comparable improvement in analysis capabilities. In this dissertation we present several algorithms we developed to meet some of the major computational challenges that have arisen in MS/MS analysis. Throughout our work we continually address two (sometimes overlapping) problems: how to make MS/MS-based sequence identifications more accurate, and how to make the identification process work much faster. Much of the work we present revolves around algorithms for de novo sequencing of peptides, which aims to discover the amino acid sequence of protein digests (peptides), solely from their experimental mass spectrum. We start off by describing a new scoring model which is used in our de novo sequencing algorithm called PepNovo. Our scoring scheme is based on a graphical model decomposition that describes many of the conditions that determine the intensities of fragment ions observed in mass spectra, such as dependencies between related fragment ions and the influence of the amino acids adjacent to the cleavage site. Besides predicting whole peptide sequences, one of the most useful applications of de novo algorithms is to generate short sequence tags for the purpose of database filtration. We demonstrate how using these tags speeds up database searches by two orders of magnitude compared to conventional methods. We extend the use of tag filtration and show that with high-resolution data, our de novo sequencing is accurate enough to enable extremely rapid identification via direct hash lookup of peptide sequences. The vast amount of MS/MS data that has become available has made it possible to use advanced data-driven machine learning methods to devise more acute algorithms. We describe a new scoring function for peptide-spectrum matches that uses the RankBoost ranking algorithm to learn and model the influences of the many intricate processes that occur during peptide fragmentation. Our method's superior discriminatory power boosts PepNovo's performance beyond the current state-of-the-art de novo sequencing algorithms. Our score also greatly improves the performance of database search programs, significantly increasing both their speed and sensitivity. When we applied our method to the challenging task of a proteogenomic search against a six-frame translation of the human genome, we were able to significantly increase the number of peptide identifications compared to current techniques by 60\%. To help speed up MS/MS analysis, we developed a clustering algorithm that exploits the redundancy that is inherent in large mass spectrometry datasets (these often contain hundreds and even thousands of spectra of the same peptide). When applied to large MS/MS datasets on the order of ten million spectra, our clustering algorithm reduces the number of spectra by an order of magnitude, without losing peptide identifications. Finally, we touch upon sequencing of intact proteins (``top-down'' analysis), which from a computational perspective, is only in its infancy -- very few algorithms have been developed for analysis of this type of data. We developed MS-TopDown, which uses the Spectral Alignment algorithm to characterize protein forms (i.e., determine the modification/mutation sites). Our algorithm can handle heavily modified proteins and can also distinguish between several isobaric protein forms present in the same spectrum.

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide PDF Author: Jian Wang
Publisher:
ISBN: 9781303217050
Category :
Languages : en
Pages : 124

Get Book Here

Book Description
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In several areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass (MS/MS) spectra. While there are numerous situations where two or more peptides are co-fragmented in the same MS/MS spectrum, nearly all mainstream computational approaches still make the ubiquitous assumption that each MS/MS spectrum comes from only one peptide. In this thesis we addressed problems in three emerging areas where computational tools that relax the above assumption are crucial for the success application of these approaches on a large-scale. In the first chapter we describe algorithms for the identification of mixture spectra that are from more than one co-eluting peptide precursors. The ability to interpret mixture spectra not only improves peptide identification in traditional data-dependent-acquisition (DDA) workflows but is also crucial for the success application of emerging data-independent-acquisition (DIA) techniques that have the potential to greatly improve the throughput of peptide identification. In chapter two, we address the problem of identification of peptides with complex post-translational modification (PTM). Detection of PTMs is important to understand the functional dynamics of proteins. Complex PTMs resulted from the conjugation of another macromolecule onto the substrate protein. The resultant modified peptides not only generate spectrum that contains a mixture of fragment ions from both the PTM and the substrate peptide but they also display substantially different fragmentation patterns as compared to conventional, unmodified peptides. We describe a hybrid experimental and computational approach to build search tools that capture the specific fragmentation patterns of modified peptides. Finally in chapter three we address the problem of identification of linked peptides. Linked peptides are two peptides that are covalently linked together. The generation and identification of linked peptides has recently been demonstrated to be a versatile tool to study protein-protein interactions and protein structures, however the identification of linked peptides face many challenges. We integrate lessons learned in the previous chapters to build an efficient and sensitive tool to identify linked peptides from MS/MS spectra.

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry PDF Author: Han, Xi
Publisher:
ISBN:
Category :
Languages : en
Pages : 69

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from protein mixtures in the field of proteomics. However, only about 30% to 40% of current MS/MS spectra can be identified, while many of them remain unassigned, even though they are of reasonable quality. The ubiquitous presence of post-translational modifications (PTMs) is one of the reasons for current low spectral identification rate. In order to identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this thesis, we describe a new algorithm for identifying modified peptides without requiring users to specify the possible modifications before the search routine; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. A software tool, PeaksPTM, has been developed and it has already achieved a stronger performance than competitive tools for unrestricted identification of post-translationally modified peptides. Another important reason for the failure of the search tools is the inaccurate mass or charge state measurement of the precursor peptide ion. In this thesis, we study the precursor mono-isotopic mass and charge determination problem, and propose an algorithm to correct precursor ion mass error by assessing the isotopic features in its parent MS spectrum. The algorithm has been tested on two annotated data sets and achieved almost 100 percent accuracy. Furthermore, we have studied a more complicated problem, the MS/MS preprocessing problem, and propose a spectrum deconvolution algorithm. Experiments were provided to compare its performance with other existing software.

Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry

Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry PDF Author: Lin He
Publisher:
ISBN:
Category :
Languages : en
Pages : 136

Get Book Here

Book Description
The emergence of tandem mass spectrometry (MS/MS) technology has significantly accelerated protein identification and quantification in proteomics. It enables high-throughput analysis of proteins and their quantities in a complex protein mixture. A mass spectrometer can easily and rapidly generate large volumes of mass spectral data for a biological sample. This bulk of data makes manual interpretation impossible and has also brought numerous challenges in automated data analysis. Algorithmic solutions have been proposed and provide indispensable analytical support in current proteomic experiments. However, new algorithms are still needed to either improve result accuracy or provide additional data analysis capabilities for both protein identification and quantification. Accurate identification of proteins in a sample is the preliminary requirement of a proteomic study. In many cases, a mass spectrum cannot provide complete information to identify the peptide without ambiguity because of the inefficiency of the peptide fragmentation technique and the prevalent existence of noise. We propose ADEPTS to this problem using the complementary information provided in different types of mass spectra. Meanwhile, the occurrence of posttranslational modifications (PTMs) on proteins is another major issue that prevents the interpretation of a large portion of spectra. Using current software tools, users have to specify possible PTMs in advance. However, the number of possible PTMs has to be limited since specifying more PTMs to the software leads to a longer running time and lower result accuracy. Thus, we develop DeNovoPTM and PeaksPTM to provide efficient and accurate solutions. Glycosylation is one of the most frequently observed PTMs in proteomics. It plays important roles in many disease processes and thus has attracted growing research interest. However, lack of algorithms that can identify intact glycopeptides has become the major obstacle that hinders glycoprotein studies. We propose a novel algorithm, GlycoMaster DB, to fulfil this urgent requirement. Additional research is presented on protein quantification, which studies the changes of protein quantity by comparing two or more mass spectral datasets. A crucial problem in the quantification is to correct the retention time distortions between different datasets. Heuristic solutions from previous research have been used in practice but none of them has yet claimed a clear optimization goal. To address this issue, we propose a combinatorial model and practical algorithms for this problem.

Practical Bioinformatics

Practical Bioinformatics PDF Author: Janusz M. Bujnicki
Publisher: Springer
ISBN: 3540742689
Category : Science
Languages : en
Pages : 275

Get Book Here

Book Description
This book presents applications of bioinformatics tools that experimental research scientists use in "daily practice." Its interdisciplinary approach combines computational and experimental methods to solve scientific problems. The book begins with reviews of computational methods for protein sequence-structure-function analysis, followed by methods that use experimental data obtained in the laboratory to improve functional predictions.

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers

Peptide Identification of Tandem Mass Spectrometry from Quadrupole Time-of-flight Mass Spectrometers PDF Author: Kuang-Ying Hsi
Publisher:
ISBN:
Category :
Languages : en
Pages : 46

Get Book Here

Book Description
Tandem mass spectrometry (MS2) is widely used for peptide and protein identification. One of the most fundamental problems for peptide identification in MS2 is to score peptide annotations against the spectrum which is produced by the peptide. In this thesis, a Bayesian network model is proposed for scoring peptides from Q-TOF mass spectrometers. The research is based on the Bayesian network probabilistic methodology used by InsPecT software, which exploits a hybrid strategy of both database search and de novo algorithms for peptide identification. Initially we focused on the connections of InsPecT scoring model without any changes of nodes. We attempted to determine the connections between nodes for Q-TOF by their dependencies. In order to prove that we need the complete set of nodes as the original InsPecT scoring model, we reduced the number of nodes and surprisingly caused significant improvement in peptide identification performance. The 18-node model was reduced to 10-node models for both charge 2 and charge 3 ions, and we obtained the percentage gain in spectra identification 37.51% for charge 2 and 57.68% for charge 3 ions compared to the InsPecT software 2006.10.20 version. The simplified model also leads to computation time reduction. Currently InsPecT does not perform as well as Mascot on Q-TOF data. Reason for that may be that InsPecT was originally trained for LTQ data and in this thesis we only focused our improvement on the InsPecT scoring stage. Deficiencies may occur in the initial tagging and final calculation of the score. Further research may do an exhaustive combination of fragment ions to derive a set of most discriminative and informative ions.