Algorithms for Tandem Mass Spectrometry-based Proteomics

Algorithms for Tandem Mass Spectrometry-based Proteomics PDF Author: Ari Michael Frank
Publisher:
ISBN:
Category :
Languages : en
Pages : 205

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has emerged as the leading technology for high-throughput proteomics analysis, making it possible to rapidly identify and characterize thousands of different proteins in complex biological samples. In recent years we have witnessed a dramatic increase in the capability to acquire proteomics MS/MS data. To avoid computational bottlenecks, this growth in acquisition power must be accompanied by a comparable improvement in analysis capabilities. In this dissertation we present several algorithms we developed to meet some of the major computational challenges that have arisen in MS/MS analysis. Throughout our work we continually address two (sometimes overlapping) problems: how to make MS/MS-based sequence identifications more accurate, and how to make the identification process work much faster. Much of the work we present revolves around algorithms for de novo sequencing of peptides, which aims to discover the amino acid sequence of protein digests (peptides), solely from their experimental mass spectrum. We start off by describing a new scoring model which is used in our de novo sequencing algorithm called PepNovo. Our scoring scheme is based on a graphical model decomposition that describes many of the conditions that determine the intensities of fragment ions observed in mass spectra, such as dependencies between related fragment ions and the influence of the amino acids adjacent to the cleavage site. Besides predicting whole peptide sequences, one of the most useful applications of de novo algorithms is to generate short sequence tags for the purpose of database filtration. We demonstrate how using these tags speeds up database searches by two orders of magnitude compared to conventional methods. We extend the use of tag filtration and show that with high-resolution data, our de novo sequencing is accurate enough to enable extremely rapid identification via direct hash lookup of peptide sequences. The vast amount of MS/MS data that has become available has made it possible to use advanced data-driven machine learning methods to devise more acute algorithms. We describe a new scoring function for peptide-spectrum matches that uses the RankBoost ranking algorithm to learn and model the influences of the many intricate processes that occur during peptide fragmentation. Our method's superior discriminatory power boosts PepNovo's performance beyond the current state-of-the-art de novo sequencing algorithms. Our score also greatly improves the performance of database search programs, significantly increasing both their speed and sensitivity. When we applied our method to the challenging task of a proteogenomic search against a six-frame translation of the human genome, we were able to significantly increase the number of peptide identifications compared to current techniques by 60\%. To help speed up MS/MS analysis, we developed a clustering algorithm that exploits the redundancy that is inherent in large mass spectrometry datasets (these often contain hundreds and even thousands of spectra of the same peptide). When applied to large MS/MS datasets on the order of ten million spectra, our clustering algorithm reduces the number of spectra by an order of magnitude, without losing peptide identifications. Finally, we touch upon sequencing of intact proteins (``top-down'' analysis), which from a computational perspective, is only in its infancy -- very few algorithms have been developed for analysis of this type of data. We developed MS-TopDown, which uses the Spectral Alignment algorithm to characterize protein forms (i.e., determine the modification/mutation sites). Our algorithm can handle heavily modified proteins and can also distinguish between several isobaric protein forms present in the same spectrum.

Algorithms for Tandem Mass Spectrometry-based Proteomics

Algorithms for Tandem Mass Spectrometry-based Proteomics PDF Author: Ari Michael Frank
Publisher:
ISBN:
Category :
Languages : en
Pages : 205

Get Book Here

Book Description
Tandem mass spectrometry (MS/MS) has emerged as the leading technology for high-throughput proteomics analysis, making it possible to rapidly identify and characterize thousands of different proteins in complex biological samples. In recent years we have witnessed a dramatic increase in the capability to acquire proteomics MS/MS data. To avoid computational bottlenecks, this growth in acquisition power must be accompanied by a comparable improvement in analysis capabilities. In this dissertation we present several algorithms we developed to meet some of the major computational challenges that have arisen in MS/MS analysis. Throughout our work we continually address two (sometimes overlapping) problems: how to make MS/MS-based sequence identifications more accurate, and how to make the identification process work much faster. Much of the work we present revolves around algorithms for de novo sequencing of peptides, which aims to discover the amino acid sequence of protein digests (peptides), solely from their experimental mass spectrum. We start off by describing a new scoring model which is used in our de novo sequencing algorithm called PepNovo. Our scoring scheme is based on a graphical model decomposition that describes many of the conditions that determine the intensities of fragment ions observed in mass spectra, such as dependencies between related fragment ions and the influence of the amino acids adjacent to the cleavage site. Besides predicting whole peptide sequences, one of the most useful applications of de novo algorithms is to generate short sequence tags for the purpose of database filtration. We demonstrate how using these tags speeds up database searches by two orders of magnitude compared to conventional methods. We extend the use of tag filtration and show that with high-resolution data, our de novo sequencing is accurate enough to enable extremely rapid identification via direct hash lookup of peptide sequences. The vast amount of MS/MS data that has become available has made it possible to use advanced data-driven machine learning methods to devise more acute algorithms. We describe a new scoring function for peptide-spectrum matches that uses the RankBoost ranking algorithm to learn and model the influences of the many intricate processes that occur during peptide fragmentation. Our method's superior discriminatory power boosts PepNovo's performance beyond the current state-of-the-art de novo sequencing algorithms. Our score also greatly improves the performance of database search programs, significantly increasing both their speed and sensitivity. When we applied our method to the challenging task of a proteogenomic search against a six-frame translation of the human genome, we were able to significantly increase the number of peptide identifications compared to current techniques by 60\%. To help speed up MS/MS analysis, we developed a clustering algorithm that exploits the redundancy that is inherent in large mass spectrometry datasets (these often contain hundreds and even thousands of spectra of the same peptide). When applied to large MS/MS datasets on the order of ten million spectra, our clustering algorithm reduces the number of spectra by an order of magnitude, without losing peptide identifications. Finally, we touch upon sequencing of intact proteins (``top-down'' analysis), which from a computational perspective, is only in its infancy -- very few algorithms have been developed for analysis of this type of data. We developed MS-TopDown, which uses the Spectral Alignment algorithm to characterize protein forms (i.e., determine the modification/mutation sites). Our algorithm can handle heavily modified proteins and can also distinguish between several isobaric protein forms present in the same spectrum.

High-Performance Algorithms for Mass Spectrometry-Based Omics

High-Performance Algorithms for Mass Spectrometry-Based Omics PDF Author: Fahad Saeed
Publisher: Springer Nature
ISBN: 3031019601
Category : Science
Languages : en
Pages : 146

Get Book Here

Book Description
To date, processing of high-throughput Mass Spectrometry (MS) data is accomplished using serial algorithms. Developing new methods to process MS data is an active area of research but there is no single strategy that focuses on scalability of MS based methods. Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. In the recent years the technology has rapidly evolved and is now capable of generating increasingly large (multiple tera-bytes per experiment) and complex (multiple species/microbiome/high-dimensional) data sets. This rapid advance in MS instrumentation must be matched by equally fast and rapid evolution of scalable methods developed for analysis of these complex data sets. Ideally, the new methods should leverage the rich heterogeneous computational resources available in a ubiquitous fashion in the form of multicore, manycore, CPU-GPU, CPU-FPGA, and IntelPhi architectures. The absence of these high-performance computing algorithms now hinders scientific advancements for mass spectrometry research. In this book we illustrate the need for high-performance computing algorithms for MS based proteomics, and proteogenomics and showcase our progress in developing these high-performance algorithms.

Development of Algorithms for Mass Spectrometry Based Proteomics

Development of Algorithms for Mass Spectrometry Based Proteomics PDF Author: Lukas Reiter
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Mass Spectrometry Data Analysis in Proteomics

Mass Spectrometry Data Analysis in Proteomics PDF Author: Rune Matthiesen
Publisher: Springer Science & Business Media
ISBN: 1597452750
Category : Science
Languages : en
Pages : 322

Get Book Here

Book Description
This is an in-depth guide to the theory and practice of analyzing raw mass spectrometry (MS) data in proteomics. The volume outlines available bioinformatics programs, algorithms, and databases available for MS data analysis. General guidelines for data analysis using search engines such as Mascot, Xtandem, and VEMS are provided, with specific attention to identifying poor quality data and optimizing search parameters.

Practical Bioinformatics

Practical Bioinformatics PDF Author: Janusz M. Bujnicki
Publisher: Springer
ISBN: 3540742689
Category : Science
Languages : en
Pages : 275

Get Book Here

Book Description
This book presents applications of bioinformatics tools that experimental research scientists use in "daily practice." Its interdisciplinary approach combines computational and experimental methods to solve scientific problems. The book begins with reviews of computational methods for protein sequence-structure-function analysis, followed by methods that use experimental data obtained in the laboratory to improve functional predictions.

Mass Spectrometry Data Analysis in Proteomics

Mass Spectrometry Data Analysis in Proteomics PDF Author: Rune Matthiesen
Publisher:
ISBN: 9781627033923
Category : Mass spectrometry
Languages : en
Pages : 405

Get Book Here

Book Description
Since the publishing of the first edition, the methodologies and instrumentation involved in the field of mass spectrometry-based proteomics has improved considerably. Fully revised and expanded, Mass Spectrometry Data Analysis in Proteomics, Second Edition presents expert chapters on specific MS-based methods or data analysis strategies in proteomics. The volume covers data analysis topics relevant for quantitative proteomics, post translational modification, HX-MS, glycomics, and data exchange standards, among other topics. Written in the highly successful Methods in Molecular Biology series format, chapters include brief introductions to their respective subjects, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Updated and authoritative, Mass Spectrometry Data Analysis in Proteomics, Second Edition serves as a detailed guide for all researchers seeking to further our knowledge in the field of proteomics.

Algorithms for Peptide Identification by Tandem Mass Spectrometry

Algorithms for Peptide Identification by Tandem Mass Spectrometry PDF Author: Franz Roos
Publisher:
ISBN:
Category :
Languages : en
Pages : 144

Get Book Here

Book Description


Mass Spectrometry-Based Chemical Proteomics

Mass Spectrometry-Based Chemical Proteomics PDF Author: W. Andy Tao
Publisher: John Wiley & Sons
ISBN: 1118969553
Category : Science
Languages : en
Pages : 448

Get Book Here

Book Description
PROVIDES STRATEGIES AND CONCEPTS FOR UNDERSTANDING CHEMICAL PROTEOMICS, AND ANALYZING PROTEIN FUNCTIONS, MODIFICATIONS, AND INTERACTIONS—EMPHASIZING MASS SPECTROMETRY THROUGHOUT Covering mass spectrometry for chemical proteomics, this book helps readers understand analytical strategies behind protein functions, their modifications and interactions, and applications in drug discovery. It provides a basic overview and presents concepts in chemical proteomics through three angles: Strategies, Technical Advances, and Applications. Chapters cover those many technical advances and applications in drug discovery, from target identification to validation and potential treatments. The first section of Mass Spectrometry-Based Chemical Proteomics starts by reviewing basic methods and recent advances in mass spectrometry for proteomics, including shotgun proteomics, quantitative proteomics, and data analyses. The next section covers a variety of techniques and strategies coupling chemical probes to MS-based proteomics to provide functional insights into the proteome. In the last section, it focuses on using chemical strategies to study protein post-translational modifications and high-order structures. Summarizes chemical proteomics, up-to-date concepts, analysis, and target validation Covers fundamentals and strategies, including the profiling of enzyme activities and protein-drug interactions Explains technical advances in the field and describes on shotgun proteomics, quantitative proteomics, and corresponding methods of software and database usage for proteomics Includes a wide variety of applications in drug discovery, from kinase inhibitors and intracellular drug targets to the chemoproteomics analysis of natural products Addresses an important tool in small molecule drug discovery, appealing to both academia and the pharmaceutical industry Mass Spectrometry-Based Chemical Proteomics is an excellent source of information for readers in both academia and industry in a variety of fields, including pharmaceutical sciences, drug discovery, molecular biology, bioinformatics, and analytical sciences.

Algorithms for Peptide Identification Via Tandem Mass Spectrometry

Algorithms for Peptide Identification Via Tandem Mass Spectrometry PDF Author: Thomas Tschager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description