Practical Methods for Approximate String Matching

Practical Methods for Approximate String Matching PDF Author: Heikki Hyyrö
Publisher:
ISBN: 9789514458187
Category : Information retrieval
Languages : en
Pages : 105

Get Book Here

Book Description
Abstract: "Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit distance that defines the distance between two strings as the minimum number of edit operations that are needed in transforming one of the strings into the other. More specifically, we discuss the Levenshtein and the Damerau edit distances. Aproximate [sic] string matching algorithms can be divided into off-line and on-line algorithms depending on whether they may or may not, respectively, preprocess the text. In this thesis we propose practical algorithms for both types of approximate string matching as well as for computing edit distance. Our main contributions are a new variant of the bit-parallel approximate string matching algorithm of Myers, a method that makes it easy to modify many existing Levenshtein edit distance algorithms into using the Damerau edit distance, a bit-parallel algorithm for computing edit distance, a more error tolerant version of the ABNDM algorithm, a two-phase filtering scheme, a tuned indexed approximate string matching method for genome searching, and an improved and extended version of the hybrid index of Navarro and Baeza-Yates. To evaluate their practicality, we compare most of the proposed methods with previously existing algorithms. The test results support the claim of the title of this thesis that our proposed algorithms work well in practice."

Practical Methods for Approximate String Matching

Practical Methods for Approximate String Matching PDF Author: Heikki Hyyrö
Publisher:
ISBN: 9789514458187
Category : Information retrieval
Languages : en
Pages : 105

Get Book Here

Book Description
Abstract: "Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit distance that defines the distance between two strings as the minimum number of edit operations that are needed in transforming one of the strings into the other. More specifically, we discuss the Levenshtein and the Damerau edit distances. Aproximate [sic] string matching algorithms can be divided into off-line and on-line algorithms depending on whether they may or may not, respectively, preprocess the text. In this thesis we propose practical algorithms for both types of approximate string matching as well as for computing edit distance. Our main contributions are a new variant of the bit-parallel approximate string matching algorithm of Myers, a method that makes it easy to modify many existing Levenshtein edit distance algorithms into using the Damerau edit distance, a bit-parallel algorithm for computing edit distance, a more error tolerant version of the ABNDM algorithm, a two-phase filtering scheme, a tuned indexed approximate string matching method for genome searching, and an improved and extended version of the hybrid index of Navarro and Baeza-Yates. To evaluate their practicality, we compare most of the proposed methods with previously existing algorithms. The test results support the claim of the title of this thesis that our proposed algorithms work well in practice."

String Searching Algorithms

String Searching Algorithms PDF Author: Graham A. Stephen
Publisher: World Scientific
ISBN: 9789810237035
Category : Computers
Languages : en
Pages : 260

Get Book Here

Book Description
A bibliographic overview of string searching and an anthology of descriptions of the principal algorithms available. Topics covered include methods for finding exact and approximate string matches, calculating "edit" distances between strings, and finding common

Combinatorial Pattern Matching

Combinatorial Pattern Matching PDF Author: Dan Hirschberg
Publisher: Springer Science & Business Media
ISBN: 9783540612582
Category : Computers
Languages : en
Pages : 408

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, CPM '96, held in Laguna Beach, California, USA, in June 1996. The 26 revised full papers included were selected from a total of 48 submissions; also included are two invited papers. Combinatorial pattern matching has become a full-fledged area of algorithmics with important applications in recent years. The book addresses all relevant aspects of combinatorial pattern matching and its importance in information retrieval, pattern recognition, compiling, data compression, program analysis, and molecular biology and thus describes the state of the art in the area.

String Processing and Information Retrieval

String Processing and Information Retrieval PDF Author: Mariano Consens
Publisher: Springer
ISBN: 3540322418
Category : Computers
Languages : en
Pages : 419

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 12th International Conference on String Processing and Information Retrieval, SPIRE 2005, held in Buenos Aires, Argentina in November 2005. The 27 revised full papers and 17 revised short papers presented were carefully reviewed and selected from 102 submissions. The papers address current issues in all aspects of string processing, information retrieval, pattern matching, computational biology, semi-structured data, and related applications.

Combinatorial Pattern Matching

Combinatorial Pattern Matching PDF Author: Alberto Apostolico
Publisher: Springer
ISBN: 3540315624
Category : Computers
Languages : en
Pages : 453

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching, CPM 2005, held in Jeju island, Korea on June 19-22, 2005. The 37 revised full papers presented were carefully reviewed and selected from 129 submissions. They constitute original research contributions in combinatorial pattern matching and its applications. Among the application fields addressed are computational biology, bioinformatics, genomics, proteinomics, data compression, Sequence Analysis and Graphs, information retrieval, data analysis, and pattern recognition.

Bioinformatics Research and Applications

Bioinformatics Research and Applications PDF Author: Ion Măndoiu
Publisher: Springer Science & Business Media
ISBN: 3540720308
Category : Science
Languages : en
Pages : 1331

Get Book Here

Book Description
This book constitutes the refereed proceedings of the Third International Symposium on Bioinformatics Research and Applications, ISBRA 2007, held in Atlanta, GA, USA in May 2007. The 55 revised full papers presented together with three invited talks cover a wide range of topics, including clustering and classification, gene expression analysis, gene networks, genome analysis, motif finding, pathways, protein structure prediction, protein domain interactions, phylogenetics, and software tools.

String Searching Algorithms

String Searching Algorithms PDF Author: Graham A Stephen
Publisher: World Scientific
ISBN: 9814501867
Category : Computers
Languages : en
Pages : 257

Get Book Here

Book Description
String searching is a subject of both theoretical and practical interest in computer science. This book presents a bibliographic overview of the field and an anthology of detailed descriptions of the principal algorithms available. The aim is twofold: on the one hand, to provide an easy-to-read comparison of the available techniques in each area, and on the other, to furnish the reader with a reference to in-depth descriptions of the major algorithms. Topics covered include methods for finding exact and approximate string matches, calculating ‘edit’ distances between strings, finding common sequences and finding the longest repetitions within strings. For clarity, all the algorithms are presented in a uniform format and notation.

Efficient Approximate String Matching Techniques for Sequence Alignment

Efficient Approximate String Matching Techniques for Sequence Alignment PDF Author: Santiago Marco-Sola
Publisher:
ISBN:
Category :
Languages : en
Pages : 213

Get Book Here

Book Description
One of the outstanding milestones achieved in recent years in the field of biotechnology research has been the development of high-throughput sequencing (HTS). Due to the fact that at the moment it is technically impossible to decode the genome as a whole, HTS technologies read billions of relatively short chunks of a genome at random locations. Such reads then need to be located within a reference for the species being studied (that is aligned or mapped to the genome): for each read one identifies in the reference regions that share a large sequence similarity with it, therefore indicating what the read¿s point or points of origin may be. HTS technologies are able to re-sequence a human individual (i.e. to establish the differences between his/her individual genome and the reference genome for the human species) in a very short period of time. They have also paved the way for the development of a number of new protocols and methods, leading to novel insights in genomics and biology in general. However, HTS technologies also pose a challenge to traditional data analysis methods; this is due to the sheer amount of data to be processed and the need for improved alignment algorithms that can generate accurate results quickly. This thesis tackles the problem of sequence alignment as a step within the analysis of HTS data. Its contributions focus on both the methodological aspects and the algorithmic challenges towards efficient, scalable, and accurate HTS mapping. From a methodological standpoint, this thesis strives to establish a comprehensive framework able to assess the quality of HTS mapping results. In order to be able to do so one has to understand the source and nature of mapping conflicts, and explore the accuracy limits inherent in how sequence alignment is performed for current HTS technologies. From an algorithmic standpoint, this work introduces state-of-the-art index structures and approximate string matching algorithms. They contribute novel insights that can be used in practical applications towards efficient and accurate read mapping. More in detail, first we present methods able to reduce the storage space taken by indexes for genome-scale references, while still providing fast query access in order to support effective search algorithms. Second, we describe novel filtering techniques that vastly reduce the computational requirements of sequence mapping, but are nonetheless capable of giving strict algorithmic guarantees on the completeness of the results. Finally, this thesis presents new incremental algorithmic techniques able to combine several approximate string matching algorithms; this leads to efficient and flexible search algorithms allowing the user to reach arbitrary search depths. All algorithms and methodological contributions of this thesis have been implemented as components of a production aligner, the GEM-mapper, which is publicly available, widely used worldwide and cited by a sizeable body of literature. It offers flexible and accurate sequence mapping while outperforming other HTS mappers both as to running time and to the quality of the results it produces.

Approximate String Processing

Approximate String Processing PDF Author: Marios Hadjieleftheriou
Publisher: Now Publishers Inc
ISBN: 1601984189
Category : Computers
Languages : en
Pages : 151

Get Book Here

Book Description
Focuses on the problem of approximate string matching and surveys indexing techniques and algorithms specifically designed for this purpose. It concentrates on inverted indexes, filtering techniques, and tree data structures that can be used to evaluate a variety of set based and edit based similarity functions.

Mathematical Foundations of Computer Science, 1991

Mathematical Foundations of Computer Science, 1991 PDF Author: Andrzej Tarlecki
Publisher: Springer
ISBN:
Category : Computers
Languages : en
Pages : 456

Get Book Here

Book Description
Proceedings of the 16th International Symposium on [title] held in Kazimierz Dolny, Poland, September 1991. Principles areas of focus include: software specification and development, parallel and distributed computing, semantics and logics of programs, algorithms, and complexity and computability theory. No index. Annotation copyrighted by Book News, Inc., Portland, OR