Big Graph Analytics on Just A Single PC

Big Graph Analytics on Just A Single PC PDF Author: Kai Wang
Publisher:
ISBN:
Category :
Languages : en
Pages : 146

Get Book Here

Book Description
As graph data becomes ubiquitous in modern computing, developing systems to efficiently process large graphs has gained increasing popularity. There are two major types of analytical problems over large graphs: graph computation and graph mining. Graph computation includes a set of problems that can be represented through liner algebra over an adjacency matrix based representation of the graph. Graph mining aims to discover complex structural patterns of a graph, for example, finding relationship patterns in social media network, detecting link spam in web data. Due to their importance in machine learning, web application and social media, graph analytical problems have been extensively studied in the past decade. Practical solutions have been implemented in a wide variety of graph analytical systems. However, most of the existing systems for graph analytics are distributed frameworks, which suffer from one or more of the following drawbacks: (1) many of the (current and future) users performing graph analytics will be domain experts with limited computer science background. They are faced with the challenge of managing a cluster, which involves tasks such as data partitioning and fault tolerance they are not familiar with; (2) not all users have access to enterprise cluster in their daily development tasks; (3) distributed graph systems commonly suffer from large startup and communication overhead; and (4) load balancing in a distributed system is another major challenge. Some graph algorithms have dynamic working sets and and it is thus hard to distribute the workload appropriately before the execution. In this dissertation, we identify three categories of graph workloads for which single-machine systems are more suitable than distributed systems: (1) analytical queries that do not need exact answers; (2) program analysis tasks that are widely used to find bugs in real-world software; and (3) graph mining algorithms that are important for many information-retrieval tasks. Based on these observations, we have developed a set of single-machine graph systems to deliver efficiency and scalability specifically for these workloads. In particular, this dissertation makes the following contributions. The first contribution is the design and implementation of a single-machine graph query system named GraphQ, which divides a large graph into partitions and merges them with the guidance from an abstraction graph. By using multiple levels of abstraction, it can quickly rule out infeasible solutions and identify mergeable partitions. GraphQ uses the memory capacity as a budget and tries its best to find solutions before exhausting the memory, making it possible to answer analytical queries over very large graphs with resources affordable to a single PC. The second contribution is the design and implementation of Graspan, a single-machine, disk-based graph processing system tailored for interprocedural static analyses. Given a program graph and a grammar specification of an analysis, Graspan uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. With the help of novel graph processing techniques, we turn sophisticated code analyses into scalable Big Graph analytics. The third contribution of this dissertation is a single-machine, out-of-core graph mining system, called RStream, which leverages disk support to support efficient edge streaming for mining very large graphs. RStream employs a rich programming model that exposes relational algebra for developers to express a wide variety of mining tasks and implements a runtime engine that delivers efficiency with tuple streaming. In conclusion, this dissertation attempts to explore the opportunities of building single-machine graph systems for scenarios where distributed systems do not work well. Our experimental results demonstrate that the techniques proposed in this dissertation can efficiently solve big graph analytical problems on a single consumer PC. We hope that these promising results will encourage future work to continue building affordable single-machine systems for a rich set of datasets and analytical tasks.

Big Graph Analytics on Just A Single PC

Big Graph Analytics on Just A Single PC PDF Author: Kai Wang
Publisher:
ISBN:
Category :
Languages : en
Pages : 146

Get Book Here

Book Description
As graph data becomes ubiquitous in modern computing, developing systems to efficiently process large graphs has gained increasing popularity. There are two major types of analytical problems over large graphs: graph computation and graph mining. Graph computation includes a set of problems that can be represented through liner algebra over an adjacency matrix based representation of the graph. Graph mining aims to discover complex structural patterns of a graph, for example, finding relationship patterns in social media network, detecting link spam in web data. Due to their importance in machine learning, web application and social media, graph analytical problems have been extensively studied in the past decade. Practical solutions have been implemented in a wide variety of graph analytical systems. However, most of the existing systems for graph analytics are distributed frameworks, which suffer from one or more of the following drawbacks: (1) many of the (current and future) users performing graph analytics will be domain experts with limited computer science background. They are faced with the challenge of managing a cluster, which involves tasks such as data partitioning and fault tolerance they are not familiar with; (2) not all users have access to enterprise cluster in their daily development tasks; (3) distributed graph systems commonly suffer from large startup and communication overhead; and (4) load balancing in a distributed system is another major challenge. Some graph algorithms have dynamic working sets and and it is thus hard to distribute the workload appropriately before the execution. In this dissertation, we identify three categories of graph workloads for which single-machine systems are more suitable than distributed systems: (1) analytical queries that do not need exact answers; (2) program analysis tasks that are widely used to find bugs in real-world software; and (3) graph mining algorithms that are important for many information-retrieval tasks. Based on these observations, we have developed a set of single-machine graph systems to deliver efficiency and scalability specifically for these workloads. In particular, this dissertation makes the following contributions. The first contribution is the design and implementation of a single-machine graph query system named GraphQ, which divides a large graph into partitions and merges them with the guidance from an abstraction graph. By using multiple levels of abstraction, it can quickly rule out infeasible solutions and identify mergeable partitions. GraphQ uses the memory capacity as a budget and tries its best to find solutions before exhausting the memory, making it possible to answer analytical queries over very large graphs with resources affordable to a single PC. The second contribution is the design and implementation of Graspan, a single-machine, disk-based graph processing system tailored for interprocedural static analyses. Given a program graph and a grammar specification of an analysis, Graspan uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. With the help of novel graph processing techniques, we turn sophisticated code analyses into scalable Big Graph analytics. The third contribution of this dissertation is a single-machine, out-of-core graph mining system, called RStream, which leverages disk support to support efficient edge streaming for mining very large graphs. RStream employs a rich programming model that exposes relational algebra for developers to express a wide variety of mining tasks and implements a runtime engine that delivers efficiency with tuple streaming. In conclusion, this dissertation attempts to explore the opportunities of building single-machine graph systems for scenarios where distributed systems do not work well. Our experimental results demonstrate that the techniques proposed in this dissertation can efficiently solve big graph analytical problems on a single consumer PC. We hope that these promising results will encourage future work to continue building affordable single-machine systems for a rich set of datasets and analytical tasks.

Systems for Big Graph Analytics

Systems for Big Graph Analytics PDF Author: Da Yan
Publisher: Springer
ISBN: 3319582178
Category : Computers
Languages : en
Pages : 93

Get Book Here

Book Description
There has been a surging interest in developing systems for analyzing big graphs generated by real applications, such as online social networks and knowledge graphs. This book aims to help readers get familiar with the computation models of various graph processing systems with minimal time investment. This book is organized into three parts, addressing three popular computation models for big graph analytics: think-like-a-vertex, think-likea- graph, and think-like-a-matrix. While vertex-centric systems have gained great popularity, the latter two models are currently being actively studied to solve graph problems that cannot be efficiently solved in vertex-centric model, and are the promising next-generation models for big graph analytics. For each part, the authors introduce the state-of-the-art systems, emphasizing on both their technical novelties and hands-on experiences of using them. The systems introduced include Giraph, Pregel+, Blogel, GraphLab, CraphChi, X-Stream, Quegel, SystemML, etc. Readers will learn how to design graph algorithms in various graph analytics systems, and how to choose the most appropriate system for a particular application at hand. The target audience for this book include beginners who are interested in using a big graph analytics system, and students, researchers and practitioners who would like to build their own graph analytics systems with new features.

Big Graph Analytics Platforms

Big Graph Analytics Platforms PDF Author: Da Yan
Publisher:
ISBN: 9781680832426
Category : Computers
Languages : en
Pages : 218

Get Book Here

Book Description
A comprehensive survey that clearly summarizes the key features and techniques developed in existing big graph systems. It aims to help readers get a systematic picture of the landscape of recent big graph systems, focusing not just on the systems themselves, but also on the key innovations and design philosophies underlying them.

Big Graph Analytics Platforms

Big Graph Analytics Platforms PDF Author: Da Yan
Publisher:
ISBN:
Category :
Languages : en
Pages : 195

Get Book Here

Book Description


Principles of Big Graph: In-depth Insight

Principles of Big Graph: In-depth Insight PDF Author:
Publisher: Elsevier
ISBN: 0323898114
Category : Computers
Languages : en
Pages : 460

Get Book Here

Book Description
Principles of Big Graph: In-depth Insight, Volume 128 in the Advances in Computer series, highlights new advances in the field with this new volume presenting interesting chapters on a variety of topics, including CESDAM: Centered subgraph data matrix for large graph representation, Bivariate, cluster and suitability analysis of NoSQL Solutions for big graph applications, An empirical investigation on Big Graph using deep learning, Analyzing correlation between quality and accuracy of graph clustering, geneBF: Filtering protein-coded gene graph data using bloom filter, Processing large graphs with an alternative representation, MapReduce based convolutional graph neural networks: A comprehensive review. Fast exact triangle counting in large graphs using SIMD acceleration, A comprehensive investigation on attack graphs, Qubit representation of a binary tree and its operations in quantum computation, Modified ML-KNN: Role of similarity measures and nearest neighbor configuration in multi label text classification on big social network graph data, Big graph based online learning through social networks, Community detection in large-scale real-world networks, Power rank: An interactive web page ranking algorithm, GA based energy efficient modelling of a wireless sensor network, The major challenges of big graph and their solutions: A review, and An investigation on socio-cyber crime graph. Provides an update on the issues and challenges faced by current researchers Updates on future research agendas Includes advanced topics for intensive research for researchers

Practical Graph Analytics with Apache Giraph

Practical Graph Analytics with Apache Giraph PDF Author: Roman Shaposhnik
Publisher: Apress
ISBN: 1484212517
Category : Computers
Languages : en
Pages : 320

Get Book Here

Book Description
Practical Graph Analytics with Apache Giraph helps you build data mining and machine learning applications using the Apache Foundation’s Giraph framework for graph processing. This is the same framework as used by Facebook, Google, and other social media analytics operations to derive business value from vast amounts of interconnected data points. Graphs arise in a wealth of data scenarios and describe the connections that are naturally formed in both digital and real worlds. Examples of such connections abound in online social networks such as Facebook and Twitter, among users who rate movies from services like Netflix and Amazon Prime, and are useful even in the context of biological networks for scientific research. Whether in the context of business or science, viewing data as connected adds value by increasing the amount of information available to be drawn from that data and put to use in generating new revenue or scientific opportunities. Apache Giraph offers a simple yet flexible programming model targeted to graph algorithms and designed to scale easily to accommodate massive amounts of data. Originally developed at Yahoo!, Giraph is now a top top-level project at the Apache Foundation, and it enlists contributors from companies such as Facebook, LinkedIn, and Twitter. Practical Graph Analytics with Apache Giraph brings the power of Apache Giraph to you, showing how to harness the power of graph processing for your own data by building sophisticated graph analytics applications using the very same framework that is relied upon by some of the largest players in the industry today.

Large-scale Graph Analysis: System, Algorithm and Optimization

Large-scale Graph Analysis: System, Algorithm and Optimization PDF Author: Yingxia Shao
Publisher: Springer Nature
ISBN: 9811539286
Category : Computers
Languages : en
Pages : 154

Get Book Here

Book Description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms. This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.

Data Analytics

Data Analytics PDF Author: Mohiuddin Ahmed
Publisher: CRC Press
ISBN: 0429820909
Category : Computers
Languages : en
Pages : 442

Get Book Here

Book Description
Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems. However, these diverse application domains give rise to new research challenges. In this context, the book provides a broad picture on the concepts, techniques, applications, and open research directions in this area. In addition, it serves as a single source of reference for acquiring the knowledge on emerging Big Data Analytics technologies.

A Declarative Framework for Big Graph Analytics and Their Provenance

A Declarative Framework for Big Graph Analytics and Their Provenance PDF Author: Vasiliki Papavasileiou
Publisher:
ISBN:
Category :
Languages : en
Pages : 127

Get Book Here

Book Description
Recent years have witnessed an explosion in size of graph data and complexity of graph analytics in fields such as social and mobile networks, science and advertisement. Analyzing and extracting knowledge from Big Graphs (in analogy to Big Data) is hard. The size of Big Graphs necessitates the use of distributed infrastructures and parallel programming. Moreover, implementing performant and correct analytics requires in depth knowledge of both algorithm and input data. Developers of graph analytics face two major challenges: i) There is a myriad of Big Graph processing frameworks, each uses a different imperative programming language and implements different low-level optimizations. Developers are burdened with understanding the low-level characteristics of an execution framework that suits best their algorithms and data. ii) Assessing the quality of both data and analytics is a tedious and manual task. Devising new graph analytics is an iterative process, where developers incrementally refine their algorithms and clean their data by analyzing results, correcting for errors and run again until the end results are satisfiable. In this dissertation we offer a declarative framework that addresses the entire life-cycle, from designing to executing, of Big Graph analytics. Our approach uses a single language for both authoring graph analytics and fine-tuning them. Specifically, this dissertation makes the following two main contributions: We design and demonstrate Datalography, the first approach for declarative graph analytics on Vertex-Centric graph processing engines. To accommodate different programming models, we design and implement a compiler that takes general Datalog queries and rewrites them into distribution-aware queries that can be efficiently evaluated on any Vertex-Centric framework. Moreover, our compiler implements automatic and transparent to the user optimizations in the form of logical query rewritings and thus are portable to any Vertex-Centric system. We demonstrate the effectiveness of our approach with an experimental evaluation on real-world graphs that indicates Datalography offers superior performance when compared to native, imperative implementations. Our second contribution is a novel provenance management approach that enables developers to customize provenance capturing and analysis with twofold benefits: the amount of captured provenance is minimized to include only the necessary information and analysis is extended beyond the traditional tracing queries. We present formal semantics of our provenance query language, based on Datalog, and identify an important class of queries that can be evaluated online, simultaneously with the graph analytic. We showcase our approach with Ariadne, a provenance management system that supports efficient debugging, auditing and fine-tuning of graph analytics.

Computer Science and Education

Computer Science and Education PDF Author: Wenxing Hong
Publisher: Springer Nature
ISBN: 981992443X
Category : Computers
Languages : en
Pages : 753

Get Book Here

Book Description
This three-volume set constitues selected papers presented during the 17th International Conference on Computer Science and Education, ICCSE 2022, held in Ningbo, China, in August 2022. The 168 full papers and 43 short papers presented were thoroughly reviewed and selected from the 510 submissions. They focus on a wide range of computer science topics, especially AI, data science, and engineering, and technology-based education, by addressing frontier technical and business issues essential to the applications of data science in both higher education and advancing e-Society.