Highly Efficient String Similarity Search and Join Over Compressed Indexes

Highly Efficient String Similarity Search and Join Over Compressed Indexes PDF Author: Guorui Xiao
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
String similarity search and string similarity join are essential operations in many fields. Existing solutions adopt a filter-and-verification framework and build inverted indexes based on generated signatures to prune dissimilar candidates. While existing solutions mainly focus on improving the query processing performance, little attention is paid to reducing the inverted indexes' memory consumption. In cases where the index size is larger than the memory, users must employ more expensive disk-based algorithms rather than in-memory ones. In this thesis, we propose a flexible framework CSS to reduce the index size and keep high query performance for string search and join applications. We give improved solutions for offline inverted list construction and introduce a new approach for the online construction of compressed inverted lists. Experimental results on large-scale datasets demonstrate that CSS can reduce memory consumption up to 5 times while having similar, or even better, query processing performance.

Highly Efficient String Similarity Search and Join Over Compressed Indexes

Highly Efficient String Similarity Search and Join Over Compressed Indexes PDF Author: Guorui Xiao
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
String similarity search and string similarity join are essential operations in many fields. Existing solutions adopt a filter-and-verification framework and build inverted indexes based on generated signatures to prune dissimilar candidates. While existing solutions mainly focus on improving the query processing performance, little attention is paid to reducing the inverted indexes' memory consumption. In cases where the index size is larger than the memory, users must employ more expensive disk-based algorithms rather than in-memory ones. In this thesis, we propose a flexible framework CSS to reduce the index size and keep high query performance for string search and join applications. We give improved solutions for offline inverted list construction and introduce a new approach for the online construction of compressed inverted lists. Experimental results on large-scale datasets demonstrate that CSS can reduce memory consumption up to 5 times while having similar, or even better, query processing performance.

Database Systems for Advanced Applications

Database Systems for Advanced Applications PDF Author: Shamkant B. Navathe
Publisher: Springer
ISBN: 3319320254
Category : Computers
Languages : en
Pages : 560

Get Book Here

Book Description
This two volume set LNCS 9642 and LNCS 9643 constitutes the refereed proceedings of the 21st International Conference on Database Systems for Advanced Applications, DASFAA 2016, held in Dallas, TX, USA, in April 2016. The 61 full papers presented were carefully reviewed and selected from a total of 183 submissions. The papers cover the following topics: crowdsourcing, data quality, entity identification, data mining and machine learning, recommendation, semantics computing and knowledge base, textual data, social networks, complex queries, similarity computing, graph databases, and miscellaneous, advanced applications.

Algorithms for Next-Generation Sequencing Data

Algorithms for Next-Generation Sequencing Data PDF Author: Mourad Elloumi
Publisher: Springer
ISBN: 3319598260
Category : Computers
Languages : en
Pages : 356

Get Book Here

Book Description
The 14 contributed chapters in this book survey the most recent developments in high-performance algorithms for NGS data, offering fundamental insights and technical information specifically on indexing, compression and storage; error correction; alignment; and assembly. The book will be of value to researchers, practitioners and students engaged with bioinformatics, computer science, mathematics, statistics and life sciences.

Scientific and Statistical Database Management

Scientific and Statistical Database Management PDF Author: Michael Gertz
Publisher: Springer Science & Business Media
ISBN: 3642138179
Category : Computers
Languages : en
Pages : 673

Get Book Here

Book Description
This book constitutes the proceedings of the 22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010, held in Heidelberg, Germany in June/July 2010. The 30 long and 11 short papers presented were carefully reviewed and selected from 94 submissions. The topics covered are query processing; scientific data management and analysis; data mining; indexes and data representation; scientific workflow and provenance; and data stream processing.

Query-efficient Algorithm for String Similarity Search

Query-efficient Algorithm for String Similarity Search PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Euro-Par 2011: Parallel Processing Workshops

Euro-Par 2011: Parallel Processing Workshops PDF Author: Michael Alexander
Publisher: Springer
ISBN: 3642297404
Category : Computers
Languages : en
Pages : 502

Get Book Here

Book Description
This book constitutes thoroughly refereed post-conference proceedings of the workshops of the 17th International Conference on Parallel Computing, Euro-Par 2011, held in Bordeaux, France, in August 2011. The papers of these 12 workshops CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS HPCF, PROPER, CCPI, and VHPC focus on promotion and advancement of all aspects of parallel and distributed computing.

High-Dimensional Indexing

High-Dimensional Indexing PDF Author: Cui Yu
Publisher: Springer
ISBN: 9783540441991
Category : Computers
Languages : en
Pages : 156

Get Book Here

Book Description
In this monograph, we study the problem of high-dimensional indexing and systematically introduce two efficient index structures: one for range queries and the other for similarity queries. Extensive experiments and comparison studies are conducted to demonstrate the superiority of the proposed indexing methods. Many new database applications, such as multimedia databases or stock price information systems, transform important features or properties of data objects into high-dimensional points. Searching for objects based on these features is thus a search of points in this feature space. To support efficient retrieval in such high-dimensional databases, indexes are required to prune the search space. Indexes for low-dimensional databases are well studied, whereas most of these application specific indexes are not scaleable with the number of dimensions, and they are not designed to support similarity searches and high-dimensional joins.

Principles of Distributed Database Systems

Principles of Distributed Database Systems PDF Author: M. Tamer Özsu
Publisher: Springer Science & Business Media
ISBN: 1441988343
Category : Computers
Languages : en
Pages : 856

Get Book Here

Book Description
This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: • New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. • Coverage of emerging topics such as data streams and cloud computing • Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

Modern B-Tree Techniques

Modern B-Tree Techniques PDF Author: Goetz Graefe
Publisher: Now Publishers Inc
ISBN: 1601984820
Category : Computers
Languages : en
Pages : 216

Get Book Here

Book Description
Invented about 40 years ago and called ubiquitous less than 10 years later, B-tree indexes have been used in a wide variety of computing systems from handheld devices to mainframes and server farms. Over the years, many techniques have been added to the basic design in order to improve efficiency or to add functionality. Examples include separation of updates to structure or contents, utility operations such as non-logged yet transactional index creation, and robust query processing such as graceful degradation during index-to-index navigation. Modern B-Tree Techniques reviews the basics of B-trees and of B-tree indexes in databases, transactional techniques and query processing techniques related to B-trees, B-tree utilities essential for database operations, and many optimizations and improvements. It is intended both as a tutorial and as a reference, enabling researchers to compare index innovations with advanced B-tree techniques and enabling professionals to select features, functions, and tradeoffs most appropriate for their data management challenges.

Introduction to Information Retrieval

Introduction to Information Retrieval PDF Author: Christopher D. Manning
Publisher: Cambridge University Press
ISBN: 1139472100
Category : Computers
Languages : en
Pages :

Get Book Here

Book Description
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.