Efficient Indexing Methods for Query Processing on Large Graphs

Efficient Indexing Methods for Query Processing on Large Graphs PDF Author: Yongjiang Liang
Publisher:
ISBN:
Category : Computer science
Languages : en
Pages : 112

Get Book Here

Book Description
As the standard formalism and powerful abstraction of networked data, graphs have been used to model and interpret structured information from protein interaction and program dependence, to business coordination and Internet topology. The proliferation of graphs has sparked a growing interest in enabling efficient accessmethods and flexible, structure-aware querying capabilities on large graphs. In order to account for noisy and distorted information arising unavoidably in real-world graphs, and to virtually any graph management tasks, it is essential and highly desirable to enable locating user-specified graph patterns on large graphs. In this thesis, we worked on subgraph query and similarity search problems on large graphs. In our first project, we worked on subgraph query problem. We consider subgraph querying with the availability ofquery workload information, $W = \{w_1, \ldots, w_n\}$, where $w_i \in W$ is a previously issued query with all its subgraph-isomorphic embeddings identified and cached beforehand. % Given a new query $q$, our goal is to exploit $W$ for subgraph query processing and optimization of $q$ in $g$. We introduce a new, workload-aware subgraph querying framework, \wasq\ (\textbf{W}orkload-\textbf{a}ware \textbf{S}ubgraph \textbf{Q}uerying), built upon key insights that query workload can be effectively leveraged for subgraph query rewriting, search plan refinement, partial results reusing, and false-positive embedding filtering toward expediting the whole subgraph querying process. In our second project, we worked on the single-query based similarity search problem. Formally, given a graph database $\mathcal{G} = \{g_1, g_2, \ldots, g_n\}$ and a query graph $q$, we aim to search the graph $g_i \in \mathcal{G}$ such that the graph edit distance between $g_i$ and $q$, GED$(g_i, q)$, is within a user-specified GED threshold, $\tau$. We propose a \emph{parameterized}, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from $\mathcal{G}$ before costly GED computation is performed. We design an efficient, \emph{selectivity-aware} algorithm to partition graphs of $\mathcal{G}$ into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, \mlindex\ (\textbf{M}ulti-\textbf{L}ayered \textbf{I}ndex), for GED lower bound crosschecking and false-positive graph filtering with theoretical performance guarantees. In our third project, we consider the \textit{multi-query optimization} problem, where a set of graph similarity queries, modeled by the well-known graph edit distance (GED) constraint, are posed against a graph database. We examine a new approach to enhancing \emph{collective} pruning and querying capabilities for graph similarity search in a \textit{multi-query} scenario. In light of the key observation that relates varying-size frequent and rare subgraph patterns to (mis)matching partitions, we select in a principled way salient features to enable \emph{selectivity-aware, feature-based} graph partitioning, leading to enhanced filtering capabilities for multi-query optimization. Furthermore, we propose multi-query grouping and ordering techniques to further speedup multi-query processing.

Efficient Indexing Methods for Query Processing on Large Graphs

Efficient Indexing Methods for Query Processing on Large Graphs PDF Author: Yongjiang Liang
Publisher:
ISBN:
Category : Computer science
Languages : en
Pages : 112

Get Book Here

Book Description
As the standard formalism and powerful abstraction of networked data, graphs have been used to model and interpret structured information from protein interaction and program dependence, to business coordination and Internet topology. The proliferation of graphs has sparked a growing interest in enabling efficient accessmethods and flexible, structure-aware querying capabilities on large graphs. In order to account for noisy and distorted information arising unavoidably in real-world graphs, and to virtually any graph management tasks, it is essential and highly desirable to enable locating user-specified graph patterns on large graphs. In this thesis, we worked on subgraph query and similarity search problems on large graphs. In our first project, we worked on subgraph query problem. We consider subgraph querying with the availability ofquery workload information, $W = \{w_1, \ldots, w_n\}$, where $w_i \in W$ is a previously issued query with all its subgraph-isomorphic embeddings identified and cached beforehand. % Given a new query $q$, our goal is to exploit $W$ for subgraph query processing and optimization of $q$ in $g$. We introduce a new, workload-aware subgraph querying framework, \wasq\ (\textbf{W}orkload-\textbf{a}ware \textbf{S}ubgraph \textbf{Q}uerying), built upon key insights that query workload can be effectively leveraged for subgraph query rewriting, search plan refinement, partial results reusing, and false-positive embedding filtering toward expediting the whole subgraph querying process. In our second project, we worked on the single-query based similarity search problem. Formally, given a graph database $\mathcal{G} = \{g_1, g_2, \ldots, g_n\}$ and a query graph $q$, we aim to search the graph $g_i \in \mathcal{G}$ such that the graph edit distance between $g_i$ and $q$, GED$(g_i, q)$, is within a user-specified GED threshold, $\tau$. We propose a \emph{parameterized}, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from $\mathcal{G}$ before costly GED computation is performed. We design an efficient, \emph{selectivity-aware} algorithm to partition graphs of $\mathcal{G}$ into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, \mlindex\ (\textbf{M}ulti-\textbf{L}ayered \textbf{I}ndex), for GED lower bound crosschecking and false-positive graph filtering with theoretical performance guarantees. In our third project, we consider the \textit{multi-query optimization} problem, where a set of graph similarity queries, modeled by the well-known graph edit distance (GED) constraint, are posed against a graph database. We examine a new approach to enhancing \emph{collective} pruning and querying capabilities for graph similarity search in a \textit{multi-query} scenario. In light of the key observation that relates varying-size frequent and rare subgraph patterns to (mis)matching partitions, we select in a principled way salient features to enable \emph{selectivity-aware, feature-based} graph partitioning, leading to enhanced filtering capabilities for multi-query optimization. Furthermore, we propose multi-query grouping and ordering techniques to further speedup multi-query processing.

Query Processing in Large-scale Networks

Query Processing in Large-scale Networks PDF Author: Miao Qiao
Publisher:
ISBN:
Category : Data structures (Computer science)
Languages : en
Pages : 302

Get Book Here

Book Description
Due to the massive size of graphs from various domains nowadays, even simple graph queries become challenging tasks. In this thesis, three queries with a wide range of applications are investigated on large graphs. One is shortest distance query, a fundamental query which computes the shortest distance between two nodes. Another query, weight constraint reachability (WCR), checks if there is a feasible path between two nodes where edge weights along the path satisfy a side constraint. And the third one, a top-k nearest keywords (k-NK) query, reports, for a query node, the k nearest nodes bearing some user-specified keywords. When confronting with a large-scale graph with over tens of millions of nodes, we need to develop efficient indexing and query optimization techniques for these queries.

Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data

Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data PDF Author: Günter Ladwig
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254

Get Book Here

Book Description
Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.

A Study of Graph Partitioning Techniques for Fast Indexing and Query Processing of a Large RDF Graph

A Study of Graph Partitioning Techniques for Fast Indexing and Query Processing of a Large RDF Graph PDF Author: Dinesh Barenkala
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 74

Get Book Here

Book Description
In recent years, the Resource Description Framework (RDF) [34] has become increasingly important for the Web and in domains such as defense and healthcare. Companies such as the New York Times [40], Best Buy [39], and Pfizer are leveraging RDF and other Semantic Web technologies for data management. Using RDF, any assertion can be represented as a (subject, predicate, object) triple. The collection of triples together represents a graph. Many techniques have been developed for RDF indexing and query processing and the most popular among them store and process RDF data using an RDBMS. In this thesis, we study the impact of existing graph partitioning techniques on indexing and query processing of a large RDF graph (e.g., YAGO [40]) with millions of edges and vertices. Our goal is to partition a large RDF graph into smaller graphs and then index the smaller graphs efficiently for faster query processing. In order to cope with cut edges, we compute the 2-hop distance across each cut edge. Once partitions are computed, we construct an index using a recently developed technique called RIS. Queries are also processed using RIS. We report the benefits and trade-offs of two different partitioning strategies using the YAGO dataset on metrics such as index construction time, index size, and query processing time. The first partitioning strategy treats the original RDF graph as an unweighted graph during partitioning. The second strategy treats the original graph as a weighted graph during partitioning. We compared the results obtained by RIS (on partitioned graphs) with RDF-3X [38], a state-of-the-art RDF query processing engine.

Database Systems for Advanced Applications

Database Systems for Advanced Applications PDF Author: Christian S. Jensen
Publisher: Springer Nature
ISBN: 3030731979
Category : Computers
Languages : en
Pages : 801

Get Book Here

Book Description
The three-volume set LNCS 12681-12683 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021. The total of 156 papers presented in this three-volume set was carefully reviewed and selected from 490 submissions. The topic areas for the selected papers include information retrieval, search and recommendation techniques; RDF, knowledge graphs, semantic web, and knowledge management; and spatial, temporal, sequence, and streaming data management, while the dominant keywords are network, recommendation, graph, learning, and model. These topic areas and keywords shed the light on the direction where the research in DASFAA is moving towards. Due to the Corona pandemic this event was held virtually.

Bioinformatics

Bioinformatics PDF Author: Information Resources Management Association
Publisher: IGI Global
ISBN: 146663605X
Category : Computers
Languages : en
Pages : 1826

Get Book Here

Book Description
"Bioinformatics: Concepts, Methodologies, Tools, and Applications highlights the area of bioinformatics and its impact over the medical community with its innovations that change how we recognize and care for illnesses"--Provided by publisher.

Managing and Mining Graph Data

Managing and Mining Graph Data PDF Author: Charu C. Aggarwal
Publisher: Springer Science & Business Media
ISBN: 1441960457
Category : Computers
Languages : en
Pages : 623

Get Book Here

Book Description
Managing and Mining Graph Data is a comprehensive survey book in graph management and mining. It contains extensive surveys on a variety of important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy. It also studies a number of domain-specific scenarios such as stream mining, web graphs, social networks, chemical and biological data. The chapters are written by well known researchers in the field, and provide a broad perspective of the area. This is the first comprehensive survey book in the emerging topic of graph data processing. Managing and Mining Graph Data is designed for a varied audience composed of professors, researchers and practitioners in industry. This volume is also suitable as a reference book for advanced-level database students in computer science and engineering.

On Uncertain Graphs

On Uncertain Graphs PDF Author: Arijit Khan
Publisher: Springer Nature
ISBN: 3031018605
Category : Computers
Languages : en
Pages : 80

Get Book Here

Book Description
Large-scale, highly interconnected networks, which are often modeled as graphs, pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database and data mining communities. Many classical algorithms such as reachability and shortest path queries become #P-complete and, thus, more expensive over uncertain graphs. Moreover, various complex queries and analytics are also emerging over uncertain networks, such as pattern matching, information diffusion, and influence maximization queries. In this book, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical and emerging graph queries and analytics. We emphasize the current challenges and highlight some future research directions.

Database Systems for Advanced Applications

Database Systems for Advanced Applications PDF Author: Jianliang Xu
Publisher: Springer Science & Business Media
ISBN: 3642202438
Category : Computers
Languages : en
Pages : 573

Get Book Here

Book Description
This book constitutes the workshop proceedings of the 16th International Conference on Database Systems for Advanced Applications, DASFAA 2011, held in Hong Kong, China, in April 2011. The volume contains six workshops, each focusing on specific research issues that contribute to the main themes of the DASFAA conference: The First International Workshop on Graph-structured Data Bases (GDB 2011); the First International Workshop on Spatial Information Modeling, Management and Mining (SIM3 2011); the International Workshop on Flash-based Database Systems (FlashDB 2011); the Second International Workshop on Social Networks and Social Media Mining on the Web (SNSMW 2011); the First International Workshop on Data Management for Emerging Network Infrastructures (DaMEN 2011); and the Fourth International Workshop on Data Quality in Integration Systems (DQIS 2011).

Efficiently Indexing High Dimensional Data Spaces

Efficiently Indexing High Dimensional Data Spaces PDF Author: Christian Böhm
Publisher: Herbert Utz Verlag
ISBN: 9783896754707
Category :
Languages : en
Pages : 266

Get Book Here

Book Description