Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry

Novel Data Analysis Methods and Algorithms for Identification of Peptides and Proteins by Use of Tandem Mass Spectrometry PDF Author: Hua Xu
Publisher:
ISBN:
Category : Bioinformatics
Languages : en
Pages :

Get Book Here

Book Description
Abstract: Tandem mass spectrometry is one of the most important tools for protein analysis. This thesis is focused on the development of new methods and algorithms for tandem mass spectrometry data analysis. A database search engine, MassMatrix, has also been developed that incorporates these methods and algorithms. The program is publicly available both on the web server at www.massmatrix.net and as a deliverable software package for personal computers. Three different scoring algorithms have been developed to identify and characterize proteins and peptides by use of tandem mass spectrometry data. The first one is targeted at the next generation of tandem mass spectrometers that are capable of high mass accuracy and resolution. Two scores calculated by the algorithm are sensitive to high mass accuracy due to the fact that this new algorithm explicitly incorporates mass accuracy into scoring potential peptide and protein matches for tandem mass spectra. The algorithm is further improved by employing Monte Carlo Simulations to calculate ion abundance based scores without any assumptions or simplifications. For high mass accuracy data, MassMatrix provides improvements in sensitivity over other database search programs. The second scoring algorithm based on peptide sequence tags inferred from tandem mass spectra further improves the performance of MassMatrix for low mass accuracy tandem mass spectrometry data. The third algorithm is the first automated data analysis method that uses peptide retention times in liquid chromatography to evaluate potential peptide matches for tandem mass spectrometry data. The algorithm predicts reverse phase liquid chromatography retention times of peptides by their hydrophobicities and compares the predicted retention times with the observed ones to evaluate the peptide matches. In order to handle low quality data, a new method has also been developed to reduce noise in tandem mass spectra and screen poor quality spectra. In addition, a data analysis method for identification of disulfide bonds in proteins and peptides by tandem mass spectrometry data has been developed and incorporated in MassMatrix. By this new approach, proteins and peptides with disulfide bonds can be directly identified in tandem mass spectrometry with high confidence without any chemical reduction and/or other derivatization.

Mass Spectrometry Data Analysis in Proteomics

Mass Spectrometry Data Analysis in Proteomics PDF Author: Rune Matthiesen
Publisher: Springer Science & Business Media
ISBN: 1597452750
Category : Science
Languages : en
Pages : 322

Get Book Here

Book Description
This is an in-depth guide to the theory and practice of analyzing raw mass spectrometry (MS) data in proteomics. The volume outlines available bioinformatics programs, algorithms, and databases available for MS data analysis. General guidelines for data analysis using search engines such as Mascot, Xtandem, and VEMS are provided, with specific attention to identifying poor quality data and optimizing search parameters.

Protein Sequencing and Identification Using Tandem Mass Spectrometry

Protein Sequencing and Identification Using Tandem Mass Spectrometry PDF Author: Michael Kinter
Publisher: John Wiley & Sons
ISBN: 0471231886
Category : Science
Languages : en
Pages : 321

Get Book Here

Book Description
How to design, execute, and interpret experiments for protein sequencing using mass spectrometry The rapid expansion of searchable protein and DNA databases in recent years has triggered an explosive growth in the application of mass spectrometry to protein sequencing. This timely and authoritative book provides professionals and scientists in biotechnology research with complete coverage of procedures for analyzing protein sequences by mass spectrometry, including step-by-step guidelines for sample preparation, analysis, and data interpretation. Michael Kinter and Nicholas Sherman present their own high-quality, laboratory-tested protocols for the analysis of a wide variety of samples, demonstrating how to carry out specific experiments and obtain fast, reliable results with a 99% success rate. Readers will get sufficient experimental detail to apply in their own laboratories, learn about the proper selection and operation of instruments, and gain essential insight into the fundamental principles of mass spectrometry and protein sequencing. Coverage includes: * Peptide fragmentation and interpretation of product ion spectra * Basic polyacrylamide gel electrophoresis * Preparation of protein digests for sequencing experiments * Mass spectrometric analysis using capillary liquid chromatography * Techniques for protein identification by database searches * Characterization of modified peptides using tandem mass spectrometry And much more

Mass Spectrometry of Proteins and Peptides

Mass Spectrometry of Proteins and Peptides PDF Author: John R. Chapman
Publisher: Springer Science & Business Media
ISBN: 1592590454
Category : Science
Languages : en
Pages : 539

Get Book Here

Book Description
Little more than three years down the line and I am already writing the Preface to a second volume to follow Protein and Peptide Analysis by Mass . What has happened in between these times to make this second venture worthwhile? New types of mass spectrometric instrumentation have appeared so that new techniques have become possible and existing techniques have become much more feasible. More particularly, however, the newer ionization te- niques, introduced for the analysis of high molecular weight materials, have now been thoroughly used and studied. As a result, there has been an en- mous improvement in the associated sample handling technology so that these methods are now routinely applied to much smaller sample amounts as well as to more intractable samples. Again, this particular community of mass spectrometry users has both increased in number and diversified. And, riding this wave of acceptance, leaders in the field have set their sights on more complex problems: molecular interaction, ion structures, quantitation, and kinetics are just a few of the newer areas reported in Mass Spectrometry of Proteins and Peptides. As with the first volume, one purpose of this collection, Mass Spectr- etry of Proteins and Peptides, is to show the reader what can be done by the application of mass spectrometry, and perhaps even to encourage the reader to venture down new paths.

Novel Methods for Improved Identification Throughput and High-resolution Scoring for Proteomics

Novel Methods for Improved Identification Throughput and High-resolution Scoring for Proteomics PDF Author: Brendan Keeley Faherty
Publisher:
ISBN:
Category :
Languages : en
Pages : 366

Get Book Here

Book Description
The field of proteomics aims to identify and quantify the protein contents of a biological sample. The mass spectrometer is the instrument of choice to characterize these proteins. In the typical proteomics experiment, mass spectra are collected from peptides as the peptides are eluted off of a liquid chromatography column and electrosprayed into the instrument as ions. Certain peptides are further selected as ions and isolated and fragmented. The fragments are recorded as tandem mass spectra, which are lists of fragment masses and intensities, and are subsequently used for identification. After the sample has been analyzed by the mass spectrometer, a number of methods, including database searching, can be used to match each tandem mass spectrum to a peptide that existed in the biological sample. Historically, the time to successfully identify the collected tandem mass spectra has been substantially longer than the time spent collecting them on the instrument. One of the standard database searching algorithms used for identification, SEQUEST, was published in 1994 when the time spent in data analysis was almost an afterthought since the number of collected spectra could be measured in the dozens. Today, modern mass spectrometers are capable of collecting thousands of tandem mass spectra each hour with orders and magnitude greater peak resolution. This thesis work builds on the SEQUEST algorithm and focuses on the use of high-resolution tandem mass spectra for the purposes of identification in order to allow more accurate and comprehensive identifications as well as novel methods to increase the throughput of the analysis of tandem mass spectra by database searching.

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry PDF Author: Ingvar Eidhammer
Publisher: John Wiley & Sons
ISBN: 111849377X
Category : Mathematics
Languages : en
Pages : 290

Get Book Here

Book Description
The definitive introduction to data analysis in quantitative proteomics This book provides all the necessary knowledge about mass spectrometry based proteomics methods and computational and statistical approaches to pursue the planning, design and analysis of quantitative proteomics experiments. The author’s carefully constructed approach allows readers to easily make the transition into the field of quantitative proteomics. Through detailed descriptions of wet-lab methods, computational approaches and statistical tools, this book covers the full scope of a quantitative experiment, allowing readers to acquire new knowledge as well as acting as a useful reference work for more advanced readers. Computational and Statistical Methods for Protein Quantification by Mass Spectrometry: Introduces the use of mass spectrometry in protein quantification and how the bioinformatics challenges in this field can be solved using statistical methods and various software programs. Is illustrated by a large number of figures and examples as well as numerous exercises. Provides both clear and rigorous descriptions of methods and approaches. Is thoroughly indexed and cross-referenced, combining the strengths of a text book with the utility of a reference work. Features detailed discussions of both wet-lab approaches and statistical and computational methods. With clear and thorough descriptions of the various methods and approaches, this book is accessible to biologists, informaticians, and statisticians alike and is aimed at readers across the academic spectrum, from advanced undergraduate students to post doctorates entering the field.

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide

Expanding the Toolbox of Tandem Mass Spectrometry with Algorithms to Identify Mass Spectra from More Than One Peptide PDF Author: Jian Wang
Publisher:
ISBN: 9781303217050
Category :
Languages : en
Pages : 124

Get Book Here

Book Description
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In several areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass (MS/MS) spectra. While there are numerous situations where two or more peptides are co-fragmented in the same MS/MS spectrum, nearly all mainstream computational approaches still make the ubiquitous assumption that each MS/MS spectrum comes from only one peptide. In this thesis we addressed problems in three emerging areas where computational tools that relax the above assumption are crucial for the success application of these approaches on a large-scale. In the first chapter we describe algorithms for the identification of mixture spectra that are from more than one co-eluting peptide precursors. The ability to interpret mixture spectra not only improves peptide identification in traditional data-dependent-acquisition (DDA) workflows but is also crucial for the success application of emerging data-independent-acquisition (DIA) techniques that have the potential to greatly improve the throughput of peptide identification. In chapter two, we address the problem of identification of peptides with complex post-translational modification (PTM). Detection of PTMs is important to understand the functional dynamics of proteins. Complex PTMs resulted from the conjugation of another macromolecule onto the substrate protein. The resultant modified peptides not only generate spectrum that contains a mixture of fragment ions from both the PTM and the substrate peptide but they also display substantially different fragmentation patterns as compared to conventional, unmodified peptides. We describe a hybrid experimental and computational approach to build search tools that capture the specific fragmentation patterns of modified peptides. Finally in chapter three we address the problem of identification of linked peptides. Linked peptides are two peptides that are covalently linked together. The generation and identification of linked peptides has recently been demonstrated to be a versatile tool to study protein-protein interactions and protein structures, however the identification of linked peptides face many challenges. We integrate lessons learned in the previous chapters to build an efficient and sensitive tool to identify linked peptides from MS/MS spectra.

Proteome Informatics

Proteome Informatics PDF Author: Conrad Bessant
Publisher: Royal Society of Chemistry
ISBN: 1782626735
Category : Science
Languages : en
Pages : 429

Get Book Here

Book Description
The field of proteomics has developed rapidly over the past decade nurturing the need for a detailed introduction to the various informatics topics that underpin the main liquid chromatography tandem mass spectrometry (LC-MS/MS) protocols used for protein identification and quantitation. Proteins are a key component of any biological system, and monitoring proteins using LC-MS/MS proteomics is becoming commonplace in a wide range of biological research areas. However, many researchers treat proteomics software tools as a black box, drawing conclusions from the output of such tools without considering the nuances and limitations of the algorithms on which such software is based. This book seeks to address this situation by bringing together world experts to provide clear explanations of the key algorithms, workflows and analysis frameworks, so that users of proteomics data can be confident that they are using appropriate tools in suitable ways.

Protein and Peptide Mass Spectrometry in Drug Discovery

Protein and Peptide Mass Spectrometry in Drug Discovery PDF Author: Michael L. Gross
Publisher: John Wiley & Sons
ISBN: 1118116542
Category : Medical
Languages : en
Pages : 484

Get Book Here

Book Description
The book that highlights mass spectrometry and its application in characterizing proteins and peptides in drug discovery An instrumental analytical method for quantifying the mass and characterization of various samples from small molecules to large proteins, mass spectrometry (MS) has become one of the most widely used techniques for studying proteins and peptides over the last decade. Bringing together the work of experts in academia and industry, Protein and Peptide Mass Spectrometry in Drug Discovery highlights current analytical approaches, industry practices, and modern strategies for the characterization of both peptides and proteins in drug discovery. Illustrating the critical role MS technology plays in characterizing target proteins and protein products, the methods used, ion mobility, and the use of microwave radiation to speed proteolysis, the book also covers important emerging applications for neuroproteomics and antigenic peptides. Placing an emphasis on the pharmaceutical industry, the book stresses practice and applications, presenting real-world examples covering the most recent advances in mass spectrometry, and providing an invaluable resource for pharmaceutical scientists in industry and academia, analytical and bioanalytical chemists, and researchers in protein science and proteomics.

Novel Data Analysis Approaches for Cross-linking Mass Spectrometry Proteomics and Glycoproteomics

Novel Data Analysis Approaches for Cross-linking Mass Spectrometry Proteomics and Glycoproteomics PDF Author: Lei Lu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Bottom-up proteomics has emerged as a powerful technology for biological studies. The technique is used for a myriad of purposes, including among others protein identification, post-translational modification identification, protein-protein interaction analysis, protein quantification analysis, and protein structure analysis. The data analysis approaches of bottom-up proteomics have evolved over the past two decades, and many different algorithms and software programs have been developed for these varied purposes. In this thesis, I have focused on improving the database search strategies for the important special applications of bottom-up proteomics, including cross-linking mass spectrometry proteomics and O-glycoproteomics. In cross-linking mass spectrometry proteomics, a sample of proteins is treated with a chemical cross-linking reagent. This causes peptides within the proteins to be cross-linked to one another, forming peptide doublets that are released by treatment of the sample with a protease such as trypsin. The data analysis tools are designed to identify the cross-linked peptides. In O-glycoproteomics, the peptides that are released by protease digestion of the protein sample can be modified with any of or even multiple distinct O-glycans, and the data analysis tools should be able to identify all of the glycans and the modification sites at which they are located. In both cases, traditional database searching strategies which try to match the experimental spectra to all potential theoretical spectra is not practical due to the large increases in search space. Researchers suffered from a lack of efficient data analysis tools for these two applications. Here we successfully devised new search algorithms to address these problems, and impemented them in two new software modules in our laboratories' bottom-up software engine MetaMorpheus (Crosslinking data analysis via MetaMorpheusXL and O-glycoproteomics data analysis via O-Pair Search). The new search strategies used in the software program are both based on ion-indexed open search, which was first developed for large scale proteomic studies in the programs MSFragger and Open-pFind. The ion-indexed open search was optimized for cross-linking mass spectrometry proteomics and O-glycoproteomics in this study, and combined with other algorithms. In O-glycoproteomics, a graph-based algorithm is used to speed up the identification and localization of O-glycans. Other useful features have been added in the software program, such as enabling analysis of both cleavable cross-links and non-cleavable cross-links in the cross-link search module, and calculating localization probabilities in the O-glyco search module. Further optimizations including machine learning methods for false discovery rate (FDR) analysis, retention time prediction and spectral prediction could further improve the current best search approaches for cross-link proteomics and O-glycoproteomics data analysis. Chapter 1 provides an overview of bottom-up proteomics data analysis methods and outlines how ion-indexed open search could be useful for special bottom-up proteomics studies. Chapter 2 describes the development of a cross-linking mass spectrometry proteomics search module, resulting in efficiency improvements for both cleavable and non-cleavable cross-link proteomics data analysis. Chapter 3 describes the development of an O-glycoproteomics search module; by combining the ion-indexed open search algorithm with the graph-based localization algorithm, the O-pair Search is more than 2000 times faster than the currently widely used software program Byonic. In Chapter 4, a novel top-down data acquisition method is described. Chapter 5 provides conclusions and future directions.