Author: Günter Ladwig
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254
Book Description
Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.
Efficient Optimization and Processing of Queries Over Text-rich Graph-structured Data
Author: Günter Ladwig
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254
Book Description
Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.
Publisher: KIT Scientific Publishing
ISBN: 3731500159
Category : Computers
Languages : en
Pages : 254
Book Description
Many databases today capture both, structured and unstructured data. Making use of such hybrid data has become an important topic in research and industry. The efficient evaluation of hybrid data queries is the main topic of this thesis. Novel techniques are proposed that improve the whole processing pipeline, from indexes and query optimization to run-time processing. The contributions are evaluated in extensive experiments showing that the proposed techniques improve upon the state of the art.
Database and XML Technologies
Author: Sihem Amer-Yahia
Publisher: Springer
ISBN: 3540388796
Category : Computers
Languages : en
Pages : 130
Book Description
This book constitutes the refereed proceedings of the 4th International XML Database Symposium, XSym 2006, held in conjunction with the International Conference on Very Large Data Bases, VLDB 2006. The book presents 8 revised full papers, focused on building XML repositories and covering query processing, caching, indexing and navigation support, structural matching, temporal XML, and XML updates. Topical sections include query evaluation and temporal XML, XPath and twigs, and XML updates.
Publisher: Springer
ISBN: 3540388796
Category : Computers
Languages : en
Pages : 130
Book Description
This book constitutes the refereed proceedings of the 4th International XML Database Symposium, XSym 2006, held in conjunction with the International Conference on Very Large Data Bases, VLDB 2006. The book presents 8 revised full papers, focused on building XML repositories and covering query processing, caching, indexing and navigation support, structural matching, temporal XML, and XML updates. Topical sections include query evaluation and temporal XML, XPath and twigs, and XML updates.
Decentralized Query Processing Over Heterogeneous Sources of Knowledge Graphs
Author: L. Heling
Publisher: IOS Press
ISBN: 164368261X
Category : Computers
Languages : en
Pages : 326
Book Description
Knowledge graphs are increasingly used in scientific and industrial applications. The large number and size of knowledge graphs published as Linked Data in autonomous sources has led to the development of various interfaces to query these knowledge graphs. Therefore, effective query processing approaches that enable efficient information retrieval from these knowledge graphs need to address the capabilities and limitations of different Linked Data Fragment interfaces. This book investigates novel approaches to addressing the challenges that arise in the presence of decentralized, heterogeneous sources of knowledge graphs. The effectiveness of these approaches is empirically evaluated and demonstrated using various real world and synthetic large-scale knowledge graphs throughout. First, a sample-based approach for generating fine-grained performance profiles is proposed, and it is demonstrated how the information from such profiles can be leveraged in cost model-based query planning. In addition, a sample-based data distribution profiling approach is advocated which aims to estimate the statistical profile features of large knowledge graphs and the applicability of these estimations in federated querying processing is demonstrated. The remainder of the book focuses on techniques to devise efficient query processing approaches when heterogeneous interfaces need to be queried but no fine-grained statistics are available. Robust techniques to support efficient query processing in these circumstances are investigated and results are shared to demonstrate the way in which these techniques can outperform state-of-the-art approaches. Finally, the author describes a framework for federated query processing over heterogeneous federations of Linked Data Fragments to exploit the capabilities of different sources by defining interface-aware approaches.
Publisher: IOS Press
ISBN: 164368261X
Category : Computers
Languages : en
Pages : 326
Book Description
Knowledge graphs are increasingly used in scientific and industrial applications. The large number and size of knowledge graphs published as Linked Data in autonomous sources has led to the development of various interfaces to query these knowledge graphs. Therefore, effective query processing approaches that enable efficient information retrieval from these knowledge graphs need to address the capabilities and limitations of different Linked Data Fragment interfaces. This book investigates novel approaches to addressing the challenges that arise in the presence of decentralized, heterogeneous sources of knowledge graphs. The effectiveness of these approaches is empirically evaluated and demonstrated using various real world and synthetic large-scale knowledge graphs throughout. First, a sample-based approach for generating fine-grained performance profiles is proposed, and it is demonstrated how the information from such profiles can be leveraged in cost model-based query planning. In addition, a sample-based data distribution profiling approach is advocated which aims to estimate the statistical profile features of large knowledge graphs and the applicability of these estimations in federated querying processing is demonstrated. The remainder of the book focuses on techniques to devise efficient query processing approaches when heterogeneous interfaces need to be queried but no fine-grained statistics are available. Robust techniques to support efficient query processing in these circumstances are investigated and results are shared to demonstrate the way in which these techniques can outperform state-of-the-art approaches. Finally, the author describes a framework for federated query processing over heterogeneous federations of Linked Data Fragments to exploit the capabilities of different sources by defining interface-aware approaches.
Learning Spark
Author: Jules S. Damji
Publisher: O'Reilly Media
ISBN: 1492050016
Category : Computers
Languages : en
Pages : 400
Book Description
Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow
Publisher: O'Reilly Media
ISBN: 1492050016
Category : Computers
Languages : en
Pages : 400
Book Description
Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow
Frontiers in Massive Data Analysis
Author: National Research Council
Publisher: National Academies Press
ISBN: 0309287812
Category : Mathematics
Languages : en
Pages : 191
Book Description
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.
Publisher: National Academies Press
ISBN: 0309287812
Category : Mathematics
Languages : en
Pages : 191
Book Description
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.
IJCAI-89
Author: International Joint Conferences on Artificial Intelligence
Publisher:
ISBN:
Category : Artificial intelligence
Languages : en
Pages : 892
Book Description
Publisher:
ISBN:
Category : Artificial intelligence
Languages : en
Pages : 892
Book Description
Proceedings of the ... International Joint Conference on Artificial Intelligence
Author:
Publisher:
ISBN:
Category : Artificial intelligence
Languages : en
Pages : 900
Book Description
Publisher:
ISBN:
Category : Artificial intelligence
Languages : en
Pages : 900
Book Description
Handbook of Research on Big Data Storage and Visualization Techniques
Author: Segall, Richard S.
Publisher: IGI Global
ISBN: 1522531432
Category : Computers
Languages : en
Pages : 1078
Book Description
The digital age has presented an exponential growth in the amount of data available to individuals looking to draw conclusions based on given or collected information across industries. Challenges associated with the analysis, security, sharing, storage, and visualization of large and complex data sets continue to plague data scientists and analysts alike as traditional data processing applications struggle to adequately manage big data. The Handbook of Research on Big Data Storage and Visualization Techniques is a critical scholarly resource that explores big data analytics and technologies and their role in developing a broad understanding of issues pertaining to the use of big data in multidisciplinary fields. Featuring coverage on a broad range of topics, such as architecture patterns, programing systems, and computational energy, this publication is geared towards professionals, researchers, and students seeking current research and application topics on the subject.
Publisher: IGI Global
ISBN: 1522531432
Category : Computers
Languages : en
Pages : 1078
Book Description
The digital age has presented an exponential growth in the amount of data available to individuals looking to draw conclusions based on given or collected information across industries. Challenges associated with the analysis, security, sharing, storage, and visualization of large and complex data sets continue to plague data scientists and analysts alike as traditional data processing applications struggle to adequately manage big data. The Handbook of Research on Big Data Storage and Visualization Techniques is a critical scholarly resource that explores big data analytics and technologies and their role in developing a broad understanding of issues pertaining to the use of big data in multidisciplinary fields. Featuring coverage on a broad range of topics, such as architecture patterns, programing systems, and computational energy, this publication is geared towards professionals, researchers, and students seeking current research and application topics on the subject.
Very Large Data Bases
Author:
Publisher:
ISBN:
Category : Database management
Languages : en
Pages : 496
Book Description
Publisher:
ISBN:
Category : Database management
Languages : en
Pages : 496
Book Description
Modern B-Tree Techniques
Author: Goetz Graefe
Publisher: Now Publishers Inc
ISBN: 1601984820
Category : Computers
Languages : en
Pages : 216
Book Description
Invented about 40 years ago and called ubiquitous less than 10 years later, B-tree indexes have been used in a wide variety of computing systems from handheld devices to mainframes and server farms. Over the years, many techniques have been added to the basic design in order to improve efficiency or to add functionality. Examples include separation of updates to structure or contents, utility operations such as non-logged yet transactional index creation, and robust query processing such as graceful degradation during index-to-index navigation. Modern B-Tree Techniques reviews the basics of B-trees and of B-tree indexes in databases, transactional techniques and query processing techniques related to B-trees, B-tree utilities essential for database operations, and many optimizations and improvements. It is intended both as a tutorial and as a reference, enabling researchers to compare index innovations with advanced B-tree techniques and enabling professionals to select features, functions, and tradeoffs most appropriate for their data management challenges.
Publisher: Now Publishers Inc
ISBN: 1601984820
Category : Computers
Languages : en
Pages : 216
Book Description
Invented about 40 years ago and called ubiquitous less than 10 years later, B-tree indexes have been used in a wide variety of computing systems from handheld devices to mainframes and server farms. Over the years, many techniques have been added to the basic design in order to improve efficiency or to add functionality. Examples include separation of updates to structure or contents, utility operations such as non-logged yet transactional index creation, and robust query processing such as graceful degradation during index-to-index navigation. Modern B-Tree Techniques reviews the basics of B-trees and of B-tree indexes in databases, transactional techniques and query processing techniques related to B-trees, B-tree utilities essential for database operations, and many optimizations and improvements. It is intended both as a tutorial and as a reference, enabling researchers to compare index innovations with advanced B-tree techniques and enabling professionals to select features, functions, and tradeoffs most appropriate for their data management challenges.