Similarity Joins in Relational Database Systems

Similarity Joins in Relational Database Systems PDF Author: Nikolaus Augsten
Publisher: Springer Nature
ISBN: 3031018516
Category : Computers
Languages : en
Pages : 106

Get Book Here

Book Description
State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.

Similarity Joins in Relational Database Systems

Similarity Joins in Relational Database Systems PDF Author: Nikolaus Augsten
Publisher: Springer Nature
ISBN: 3031018516
Category : Computers
Languages : en
Pages : 106

Get Book Here

Book Description
State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance computations. The basic idea is to decompose complex objects into sets of tokens that can be compared efficiently. Token-based distances are used to compute an approximation of the edit distance and prune expensive edit distance calculations. A key observation when computing similarity joins is that many of the object pairs, for which the similarity is computed, are very different from each other. Filters exploit this property to improve the performance of similarity joins. A filter preprocesses the input data sets and produces a set of candidate pairs. The distance function is evaluated on the candidate pairs only. We describe the essential query processing techniques for filters based on lower and upper bounds. For token equality joins we describe prefix, size, positional and partitioning filters, which can be used to avoid the computation of small intersections that are not needed since the similarity would be too low.

Similarity Search and Applications

Similarity Search and Applications PDF Author: Giuseppe Amato
Publisher: Springer
ISBN: 3319250876
Category : Computers
Languages : en
Pages : 363

Get Book Here

Book Description
This book constitutes the proceedings of the 8th International Conference on Similarity Search and Applications, SISAP 2015, held in Glasgow, UK, in October 2015. The 19 full papers, 12 short and 9 demo and poster papers presented in this volume were carefully reviewed and selected from 68 submissions. They are organized in topical sections named: improving similarity search methods and techniques; metrics and evaluation; applications and specific domains; implementation and engineering solutions; posters; demo papers.

Database Systems for Advanced Applications

Database Systems for Advanced Applications PDF Author: Jayant R. Haritsa
Publisher: Springer Science & Business Media
ISBN: 3540785671
Category : Computers
Languages : en
Pages : 734

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, held in New Delhi, India, in March 2008. The 30 revised full papers and 27 revised short papers presented together with the abstracts of 3 invited talks as well as 8 demonstration papers and a panel discussion motivation were carefully reviewed and selected from 173 submissions. The papers are organized in topical sections on XML schemas, data mining, spatial data, indexes and cubes, data streams, P2P and transactions, XML processing, complex pattern processing, IR techniques, queries and transactions, data mining, XML databases, data warehouses and industrial applications, as well as mobile and distributed data.

Database Systems for Advanced Applications

Database Systems for Advanced Applications PDF Author: Weiyi Meng
Publisher: Springer
ISBN: 3642374506
Category : Computers
Languages : en
Pages : 507

Get Book Here

Book Description
This two volume set LNCS 7825 and LNCS 7826 constitutes the refereed proceedings of the 18th International Conference on Database Systems for Advanced Applications, DASFAA 2013, held in Wuhan, China, in April 2013. The 51 revised full papers and 10 short papers presented together with 2 invited keynote talks, 1 invited paper, 3 industrial papers, 9 demo presentations, 4 tutorials and 1 panel paper were carefully reviewed and selected from a total of 227 submissions. The topics covered in part 1 are social networks; query processing; nearest neighbor search; index; query analysis; XML data management; privacy protection; and uncertain data management; and in part 2: graph data management; physical design; knowledge management; temporal data management; social networks; query processing; data mining; applications; and database applications.

Data-Intensive Workflow Management

Data-Intensive Workflow Management PDF Author: Daniel Oliveira
Publisher: Springer Nature
ISBN: 3031018729
Category : Computers
Languages : en
Pages : 161

Get Book Here

Book Description
Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

On Transactional Concurrency Control

On Transactional Concurrency Control PDF Author: Goetz Graefe
Publisher: Springer Nature
ISBN: 3031018737
Category : Computers
Languages : en
Pages : 383

Get Book Here

Book Description
This book contains a number of chapters on transactional database concurrency control. This volume's entire sequence of chapters can summarized as follows: A two-sentence summary of the volume's entire sequence of chapters is this: traditional locking techniques can be improved in multiple dimensions, notably in lock scopes (sizes), lock modes (increment, decrement, and more), lock durations (late acquisition, early release), and lock acquisition sequence (to avoid deadlocks). Even if some of these improvements can be transferred to optimistic concurrency control, notably a fine granularity of concurrency control with serializable transaction isolation including phantom protection, pessimistic concurrency control is categorically superior to optimistic concurrency control, i.e., independent of application, workload, deployment, hardware, and software implementation.

Community Search over Big Graphs

Community Search over Big Graphs PDF Author: Xin Huang
Publisher: Springer Nature
ISBN: 3031018745
Category : Computers
Languages : en
Pages : 188

Get Book Here

Book Description
Communities serve as basic structural building blocks for understanding the organization of many real-world networks, including social, biological, collaboration, and communication networks. Recently, community search over graphs has attracted significantly increasing attention, from small, simple, and static graphs to big, evolving, attributed, and location-based graphs. In this book, we first review the basic concepts of networks, communities, and various kinds of dense subgraph models. We then survey the state of the art in community search techniques on various kinds of networks across different application areas. Specifically, we discuss cohesive community search, attributed community search, social circle discovery, and geo-social group search. We highlight the challenges posed by different community search problems. We present their motivations, principles, methodologies, algorithms, and applications, and provide a comprehensive comparison of the existing techniques. This book finally concludes by listing publicly available real-world datasets and useful tools for facilitating further research, and by offering further readings and future directions of research in this important and growing area.

On Uncertain Graphs

On Uncertain Graphs PDF Author: Arijit Khan
Publisher: Springer Nature
ISBN: 3031018605
Category : Computers
Languages : en
Pages : 80

Get Book Here

Book Description
Large-scale, highly interconnected networks, which are often modeled as graphs, pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database and data mining communities. Many classical algorithms such as reachability and shortest path queries become #P-complete and, thus, more expensive over uncertain graphs. Moreover, various complex queries and analytics are also emerging over uncertain networks, such as pattern matching, information diffusion, and influence maximization queries. In this book, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical and emerging graph queries and analytics. We emphasize the current challenges and highlight some future research directions.

Natural Language Data Management and Interfaces

Natural Language Data Management and Interfaces PDF Author: Yunyao Li
Publisher: Springer Nature
ISBN: 3031018621
Category : Computers
Languages : en
Pages : 136

Get Book Here

Book Description
The volume of natural language text data has been rapidly increasing over the past two decades, due to factors such as the growth of the Web, the low cost associated with publishing, and the progress on the digitization of printed texts. This growth combined with the proliferation of natural language systems for search and retrieving information provides tremendous opportunities for studying some of the areas where database systems and natural language processing systems overlap. This book explores two interrelated and important areas of overlap: (1) managing natural language data and (2) developing natural language interfaces to databases. It presents relevant concepts and research questions, state-of-the-art methods, related systems, and research opportunities and challenges covering both areas. Relevant topics discussed on natural language data management include data models, data sources, queries, storage and indexing, and transforming natural language text. Under natural language interfaces, it presents the anatomy of these interfaces to databases, the challenges related to query understanding and query translation, and relevant aspects of user interactions. Each of the challenges is covered in a systematic way: first starting with a quick overview of the topics, followed by a comprehensive view of recent techniques that have been proposed to address the challenge along with illustrative examples. It also reviews some notable systems in details in terms of how they address different challenges and their contributions. Finally, it discusses open challenges and opportunities for natural language management and interfaces. The goal of this book is to provide an introduction to the methods, problems, and solutions that are used in managing natural language data and building natural language interfaces to databases. It serves as a starting point for readers who are interested in pursuing additional work on these exciting topics in both academic and industrial environments.

Human Interaction with Graphs

Human Interaction with Graphs PDF Author: Sourav S. Bhowmick
Publisher: Springer Nature
ISBN: 3031018613
Category : Computers
Languages : en
Pages : 186

Get Book Here

Book Description
Interacting with graphs using queries has emerged as an important research problem for real-world applications that center on large graph data. Given the syntactic complexity of graph query languages (e.g., SPARQL, Cypher), visual graph query interfaces make it easy for non-programmers to query such graph data repositories. In this book, we present recent developments in the emerging area of visual graph querying paradigm that bridges traditional graph querying with human computer interaction (HCI). Specifically, we focus on techniques that emphasize deep integration between the visual graph query interface and the underlying graph query engine. We discuss various strategies and guidance for constructing graph queries visually, interleaving processing of graph queries and visual actions, visual exploration of graph query results, and automated performance study of visual graph querying frameworks. In addition, this book highlights open problems and new research directions. In summary, in this book, we review and summarize the research thus far into the integration of HCI and graph querying to facilitate user-friendly interaction with graph-structured data, giving researchers a snapshot of the current state of the art in this topic, and future research directions.