Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data PDF full book. Access full book title Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data by Günter Ladwig. Download full books in PDF and EPUB format.

Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data

Author: Günter Ladwig
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254

Get Book Here

Book Description
Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.

Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data

Author: Günter Ladwig
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254

Get Book Here

Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data

Author: Günter Ladwig
Publisher:
ISBN: 9781000034424
Category :
Languages : en
Pages : 0

Get Book Here

Ranking for Web Data Search Using On-The-Fly Data Integration

Author: Herzig, Daniel Markus
Publisher: KIT Scientific Publishing
ISBN: 3731501368
Category : Computers
Languages : en
Pages : 222

Get Book Here

Book Description
Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines.

Query Processing over Graph-structured Data on the Web

Author: M. Acosta Deibe
Publisher: IOS Press
ISBN: 1614999163
Category : Computers
Languages : en
Pages : 244

Get Book Here

Book Description
In the last years, Linked Data initiatives have encouraged the publication of large graph-structured datasets using the Resource Description Framework (RDF). Due to the constant growth of RDF data on the web, more flexible data management infrastructures must be able to efficiently and effectively exploit the vast amount of knowledge accessible on the web. This book presents flexible query processing strategies over RDF graphs on the web using the SPARQL query language. In this work, we show how query engines can change plans on-the-fly with adaptive techniques to cope with unpredictable conditions and to reduce execution time. Furthermore, this work investigates the application of crowdsourcing in query processing, where engines are able to contact humans to enhance the quality of query answers. The theoretical and empirical results presented in this book indicate that flexible techniques allow for querying RDF data sources efficiently and effectively.

Efficient Algorithms for Querying Large-Scale Data in Relational, XML, and Graph-Structured Data Repositories

Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
We live in an information age, and data are ubiquitous today. Various applications, ranging from scientific computing, medical research, and bioinformatics to administrative management, commercial sales, and financial marketing, generate and utilize data every day. Many of these applications are data intensive, with the amount of data involved potentially reaching hundreds of thousands of gigabytes. Further, different applications store data using different data models. For example, applications could store and manage structured data using a flat (relational) model, semi-structured data using a hierarchical (XML) model, and less-structured data using a more general and flexible graph model. In this thesis, I report my research results on efficiently querying large-scale data in relational, XML, and graph-structured data repositories. Specifically, this thesis covers three research projects, which I have been invited to present in the ACM SIGMOD conference in 2006, 2007, and 2008, respectively. The first project concerns efficient querying of relational data using materialized views and introduces our efficient view-based query-optimization algorithms that support a large and practically important subset of SQL queries. The second project focuses on efficiently querying XML data and presents efficient algorithms for evaluating XPath queries over XML streams, which are the first ones that achieve the O(|D||Q|) time performance, where |D| is the XML data size and |Q| is the XPath query size. Meanwhile, our algorithm EQ also achieves optimal space performance. The third project addresses efficient querying of graph-structured data, by introducing efficient algorithms for retrieving top-ranked tree-pattern matches from large graphs. While a tree-pattern query could have an extremely large, potentially exponential, number of answer matches in a graph, our algorithms exhibit time and space performance that is linear or sub-linear in the size of the input data. Our algorith.

Query Processing and Indexing Techniques on Semi-structured Data

Author: Hao He
Publisher:
ISBN:
Category : Data structures (Computer science)
Languages : en
Pages : 520

Get Book Here

Book Description
Queries over semi-structured data consider its textual contents as well as structure. Query processing is challenging because of the lack of schema and richness in structure. This dissertation develops a collection of query processing and indexing techniques to support efficient queries over tree- and graph-structured data. Specific contributions include (1) practical index structures for supporting evaluating label path expressions and checking graph reachability, two fundamental primitives for query processing over semi-structured data, and (2) a keyword search system for finding and ranking substructures of interest within text-labeled graph-structured data.

Querying Graphs

Author: Angela Bonifati
Publisher: Morgan & Claypool Publishers
ISBN: 1681734311
Category : Computers
Languages : en
Pages : 186

Get Book Here

Book Description
Graph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems. We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems.

Efficient Indexing Methods for Query Processing on Large Graphs

Author: Yongjiang Liang
Publisher:
ISBN:
Category : Computer science
Languages : en
Pages : 112

Get Book Here

Book Description
As the standard formalism and powerful abstraction of networked data, graphs have been used to model and interpret structured information from protein interaction and program dependence, to business coordination and Internet topology. The proliferation of graphs has sparked a growing interest in enabling efficient accessmethods and flexible, structure-aware querying capabilities on large graphs. In order to account for noisy and distorted information arising unavoidably in real-world graphs, and to virtually any graph management tasks, it is essential and highly desirable to enable locating user-specified graph patterns on large graphs. In this thesis, we worked on subgraph query and similarity search problems on large graphs. In our first project, we worked on subgraph query problem. We consider subgraph querying with the availability ofquery workload information, $W = \{w_1, \ldots, w_n\}$, where $w_i \in W$ is a previously issued query with all its subgraph-isomorphic embeddings identified and cached beforehand. % Given a new query $q$, our goal is to exploit $W$ for subgraph query processing and optimization of $q$ in $g$. We introduce a new, workload-aware subgraph querying framework, \wasq\ (\textbf{W}orkload-\textbf{a}ware \textbf{S}ubgraph \textbf{Q}uerying), built upon key insights that query workload can be effectively leveraged for subgraph query rewriting, search plan refinement, partial results reusing, and false-positive embedding filtering toward expediting the whole subgraph querying process. In our second project, we worked on the single-query based similarity search problem. Formally, given a graph database $\mathcal{G} = \{g_1, g_2, \ldots, g_n\}$ and a query graph $q$, we aim to search the graph $g_i \in \mathcal{G}$ such that the graph edit distance between $g_i$ and $q$, GED$(g_i, q)$, is within a user-specified GED threshold, $\tau$. We propose a \emph{parameterized}, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from $\mathcal{G}$ before costly GED computation is performed. We design an efficient, \emph{selectivity-aware} algorithm to partition graphs of $\mathcal{G}$ into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, \mlindex\ (\textbf{M}ulti-\textbf{L}ayered \textbf{I}ndex), for GED lower bound crosschecking and false-positive graph filtering with theoretical performance guarantees. In our third project, we consider the \textit{multi-query optimization} problem, where a set of graph similarity queries, modeled by the well-known graph edit distance (GED) constraint, are posed against a graph database. We examine a new approach to enhancing \emph{collective} pruning and querying capabilities for graph similarity search in a \textit{multi-query} scenario. In light of the key observation that relates varying-size frequent and rare subgraph patterns to (mis)matching partitions, we select in a principled way salient features to enable \emph{selectivity-aware, feature-based} graph partitioning, leading to enhanced filtering capabilities for multi-query optimization. Furthermore, we propose multi-query grouping and ordering techniques to further speedup multi-query processing.

Query Processing Over Graph-structured Data on the Web

Author: Maribel Acosta Deibe
Publisher:
ISBN: 9783898387385
Category :
Languages : en
Pages :

Get Book Here

Book Description

Main Memory Management on Relational Database Systems

Author: Pedro Mejia Alvarez
Publisher: Springer Nature
ISBN: 3031132955
Category : Computers
Languages : en
Pages : 115

Get Book Here

Book Description
This book provides basic knowledge about main memory management in relational databases as it is needed to support large-scale applications processed completely in memory. In business operations, real-time predictability and high speed is a must. Hence every opportunity must be exploited to improve performance, including reducing dependency on the hard disk, adding more memory to make more data resident in the memory, and even deploying an in-memory system where all data can be kept in memory. The book provides one chapter for each of the main related topics, i.e. the memory system, memory management, virtual memory, and databases and their memory systems, and it is complemented by a short survey of six commercial systems: TimesTen, MySQL, VoltDB, Hekaton, HyPer/ScyPer, and SAP HANA.