Scalable Algorithms for Cloud-based Semantic Web Data Management

Scalable Algorithms for Cloud-based Semantic Web Data Management PDF Author: Stamatis Zampetakis
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop's Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system.

Scalable Algorithms for Cloud-based Semantic Web Data Management

Scalable Algorithms for Cloud-based Semantic Web Data Management PDF Author: Stamatis Zampetakis
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop's Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system.

Semantics Empowered Web 3.0

Semantics Empowered Web 3.0 PDF Author: Amit Sheth
Publisher: Morgan & Claypool Publishers
ISBN: 1608457168
Category : Computers
Languages : en
Pages : 178

Get Book Here

Book Description
Presents a unified approach to harness and exploit all forms of contemporary Web resources using the core principles of ability to associate meaning with data through conceptual or domain models and semantic descriptions, including annotations, and through advanced semantic techniques for search, integration, and analysis.

Secure Data Provenance and Inference Control with Semantic Web

Secure Data Provenance and Inference Control with Semantic Web PDF Author: Bhavani Thuraisingham
Publisher: CRC Press
ISBN: 1466569433
Category : Computers
Languages : en
Pages : 482

Get Book Here

Book Description
With an ever-increasing amount of information on the web, it is critical to understand the pedigree, quality, and accuracy of your data. Using provenance, you can ascertain the quality of data based on its ancestral data and derivations, track back to sources of errors, allow automatic re-enactment of derivations to update data, and provide attribution of the data source. Secure Data Provenance and Inference Control with Semantic Web supplies step-by-step instructions on how to secure the provenance of your data to make sure it is safe from inference attacks. It details the design and implementation of a policy engine for provenance of data and presents case studies that illustrate solutions in a typical distributed health care system for hospitals. Although the case studies describe solutions in the health care domain, you can easily apply the methods presented in the book to a range of other domains. The book describes the design and implementation of a policy engine for provenance and demonstrates the use of Semantic Web technologies and cloud computing technologies to enhance the scalability of solutions. It covers Semantic Web technologies for the representation and reasoning of the provenance of the data and provides a unifying framework for securing provenance that can help to address the various criteria of your information systems. Illustrating key concepts and practical techniques, the book considers cloud computing technologies that can enhance the scalability of solutions. After reading this book you will be better prepared to keep up with the on-going development of the prototypes, products, tools, and standards for secure data management, secure Semantic Web, secure web services, and secure cloud computing.

Cloud-Based RDF Data Management

Cloud-Based RDF Data Management PDF Author: Zoi Kaoudi
Publisher: Springer Nature
ISBN: 3031018753
Category : Computers
Languages : en
Pages : 91

Get Book Here

Book Description
Resource Description Framework (or RDF, in short) is set to deliver many of the original semi-structured data promises: flexible structure, optional schema, and rich, flexible Universal Resource Identifiers as a basis for information sharing. Moreover, RDF is uniquely positioned to benefit from the efforts of scientific communities studying databases, knowledge representation, and Web technologies. As a consequence, the RDF data model is used in a variety of applications today for integrating knowledge and information: in open Web or government data via the Linked Open Data initiative, in scientific domains such as bioinformatics, and more recently in search engines and personal assistants of enterprises in the form of knowledge graphs. Managing such large volumes of RDF data is challenging due to the sheer size, heterogeneity, and complexity brought by RDF reasoning. To tackle the size challenge, distributed architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications requiring distributed architectures for the scalability, fault tolerance, and elasticity features it provides. At the same time, interest in massively parallel processing has been renewed by the MapReduce model and many follow-up works, which aim at simplifying the deployment of massively parallel data management tasks in a cloud environment. In this book, we study the state-of-the-art RDF data management in cloud environments and parallel/distributed architectures that were not necessarily intended for the cloud, but can easily be deployed therein. After providing a comprehensive background on RDF and cloud technologies, we explore four aspects that are vital in an RDF data management system: data storage, query processing, query optimization, and reasoning. We conclude the book with a discussion on open problems and future directions.

Semantics Empowered Web 3.0

Semantics Empowered Web 3.0 PDF Author: Amit Sheth
Publisher: Springer Nature
ISBN: 303101894X
Category : Computers
Languages : en
Pages : 159

Get Book Here

Book Description
After the traditional document-centric Web 1.0 and user-generated content focused Web 2.0, Web 3.0 has become a repository of an ever growing variety of Web resources that include data and services associated with enterprises, social networks, sensors, cloud, as well as mobile and other devices that constitute the Internet of Things. These pose unprecedented challenges in terms of heterogeneity (variety), scale (volume), and continuous changes (velocity), as well as present corresponding opportunities if they can be exploited. Just as semantics has played a critical role in dealing with data heterogeneity in the past to provide interoperability and integration, it is playing an even more critical role in dealing with the challenges and helping users and applications exploit all forms of Web 3.0 data. This book presents a unified approach to harness and exploit all forms of contemporary Web resources using the core principles of ability to associate meaning with data through conceptual or domain models and semantic descriptions including annotations, and through advanced semantic techniques for search, integration, and analysis. It discusses the use of Semantic Web standards and techniques when appropriate, but also advocates the use of lighter weight, easier to use, and more scalable options when they are more suitable. The authors' extensive experience spanning research and prototypes to development of operational applications and commercial technologies and products guide the treatment of the material. Table of Contents: Role of Semantics and Metadata / Types and Models of Semantics / Annotation -- Adding Semantics to Data / Semantics for Enterprise Data / Semantics for Services / Semantics for Sensor Data / Semantics for Social Data / Semantics for Cloud Computing / Semantics for Advanced Applications

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXX

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXX PDF Author: Abdelkader Hameurlain
Publisher: Springer
ISBN: 3662540541
Category : Computers
Languages : en
Pages : 140

Get Book Here

Book Description
This, the 30th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains six in-depth papers focusing on the subject of cloud computing. Topics covered within this context include cloud storage, model-driven development, informative modeling, and security-critical systems.

Transactions on Large-Scale Data- and Knowledge-Centered Systems XX

Transactions on Large-Scale Data- and Knowledge-Centered Systems XX PDF Author: Abdelkader Hameurlain
Publisher: Springer
ISBN: 3662467038
Category : Computers
Languages : en
Pages : 169

Get Book Here

Book Description
The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. Current decentralized systems still focus on data and knowledge as their main resource. Feasibility of these systems relies basically on P2P (peer-to-peer) techniques and the support of agent systems with scaling and decentralized control. Synergy between grids, P2P systems, and agent technologies is the key to data- and knowledge-centered systems in large-scale environments. This, the 20th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, presents a representative and useful selection of articles covering a wide range of important topics in the domain of advanced techniques for big data management. Big data has become a popular term, used to describe the exponential growth and availability of data. The recent radical expansion and integration of computation, networking, digital devices, and data storage has provided a robust platform for the explosion in big data, as well as being the means by which big data are generated, processed, shared, and analyzed. In general, data are only useful if meaning and value can be extracted from them. Big data discovery enables data scientists and other analysts to uncover patterns and correlations through analysis of large volumes of data of diverse types. Insights gleaned from big data discovery can provide businesses with significant competitive advantages, leading to more successful marketing campaigns, decreased customer churn, and reduced loss from fraud. In practice, the growing demand for large-scale data processing and data analysis applications has spurred the development of novel solutions from both industry and academia.

Big Data Computing

Big Data Computing PDF Author: Rajendra Akerkar
Publisher: CRC Press
ISBN: 1466578386
Category : Business & Economics
Languages : en
Pages : 562

Get Book Here

Book Description
Due to market forces and technological evolution, Big Data computing is developing at an increasing rate. A wide variety of novel approaches and tools have emerged to tackle the challenges of Big Data, creating both more opportunities and more challenges for students and professionals in the field of data computation and analysis. Presenting a mix

Semantic Web for the Working Ontologist

Semantic Web for the Working Ontologist PDF Author: James Hendler
Publisher: Morgan & Claypool
ISBN: 1450376150
Category : Computers
Languages : en
Pages : 510

Get Book Here

Book Description
Enterprises have made amazing advances by taking advantage of data about their business to provide predictions and understanding of their customers, markets, and products. But as the world of business becomes more interconnected and global, enterprise data is no long a monolith; it is just a part of a vast web of data. Managing data on a world-wide scale is a key capability for any business today. The Semantic Web treats data as a distributed resource on the scale of the World Wide Web, and incorporates features to address the challenges of massive data distribution as part of its basic design. The aim of the first two editions was to motivate the Semantic Web technology stack from end-to-end; to describe not only what the Semantic Web standards are and how they work, but also what their goals are and why they were designed as they are. It tells a coherent story from beginning to end of how the standards work to manage a world-wide distributed web of knowledge in a meaningful way. The third edition builds on this foundation to bring Semantic Web practice to enterprise. Fabien Gandon joins Dean Allemang and Jim Hendler, bringing with him years of experience in global linked data, to open up the story to a modern view of global linked data. While the overall story is the same, the examples have been brought up to date and applied in a modern setting, where enterprise and global data come together as a living, linked network of data. Also included with the third edition, all of the data sets and queries are available online for study and experimentation at data.world/swwo.

Frontiers in Massive Data Analysis

Frontiers in Massive Data Analysis PDF Author: National Research Council
Publisher: National Academies Press
ISBN: 0309287812
Category : Mathematics
Languages : en
Pages : 191

Get Book Here

Book Description
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.