On Software Infrastructure for Scalable Graph Analytics

On Software Infrastructure for Scalable Graph Analytics PDF Author: Yingyi Bu
Publisher:
ISBN: 9781339124087
Category :
Languages : en
Pages : 129

Get Book Here

Book Description
Recently, there is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large datasets. In the mean time, in real-world applications, it is highly desirable to reduce the tedious, inefficient ETL (extract, transform, load) gap between tabular data processing systems and graph processing systems. Unfortunately, those challenges have not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow, as well as the separation of tabular data processing runtimes and graph processing runtimes. In this thesis, we explore the application of programming techniques and algorithms from the database systems world to the problem of scalable graph analysis. We first propose a bloat-aware design paradigm towards the development of efficient and scalable Big Data applications in object-oriented, GC enabled languages and demonstrate that programming under this paradigm does not incur significant programming burden but obtains remarkable performance gains (e.g., 2.5X). Based on the design paradigm, we then build Pregelix, an open source distributed graph processing system which is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15X speedup compared to Apache Giraph and up to 35X speedup compared to distributed GraphLab). Finally, we integrate Pregelix with the open source Big Data management system AsterixDB to offer users a mix of a vertex-oriented programming model and a declarative query language for richer forms of Big Graph analytics with reduced ETL pains.

On Software Infrastructure for Scalable Graph Analytics

On Software Infrastructure for Scalable Graph Analytics PDF Author: Yingyi Bu
Publisher:
ISBN: 9781339124087
Category :
Languages : en
Pages : 129

Get Book Here

Book Description
Recently, there is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large datasets. In the mean time, in real-world applications, it is highly desirable to reduce the tedious, inefficient ETL (extract, transform, load) gap between tabular data processing systems and graph processing systems. Unfortunately, those challenges have not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow, as well as the separation of tabular data processing runtimes and graph processing runtimes. In this thesis, we explore the application of programming techniques and algorithms from the database systems world to the problem of scalable graph analysis. We first propose a bloat-aware design paradigm towards the development of efficient and scalable Big Data applications in object-oriented, GC enabled languages and demonstrate that programming under this paradigm does not incur significant programming burden but obtains remarkable performance gains (e.g., 2.5X). Based on the design paradigm, we then build Pregelix, an open source distributed graph processing system which is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15X speedup compared to Apache Giraph and up to 35X speedup compared to distributed GraphLab). Finally, we integrate Pregelix with the open source Big Data management system AsterixDB to offer users a mix of a vertex-oriented programming model and a declarative query language for richer forms of Big Graph analytics with reduced ETL pains.

Distributed Graph Analytics

Distributed Graph Analytics PDF Author: Unnikrishnan Cheramangalath
Publisher: Springer Nature
ISBN: 3030418863
Category : Computers
Languages : en
Pages : 207

Get Book Here

Book Description
This book brings together two important trends: graph algorithms and high-performance computing. Efficient and scalable execution of graph processing applications in data or network analysis requires innovations at multiple levels: algorithms, associated data structures, their implementation and tuning to a particular hardware. Further, programming languages and the associated compilers play a crucial role when it comes to automating efficient code generation for various architectures. This book discusses the essentials of all these aspects. The book is divided into three parts: programming, languages, and their compilation. The first part examines the manual parallelization of graph algorithms, revealing various parallelization patterns encountered, especially when dealing with graphs. The second part uses these patterns to provide language constructs that allow a graph algorithm to be specified. Programmers can work with these language constructs without worrying about their implementation, which is the focus of the third part. Implementation is handled by a compiler, which can specialize code generation for a backend device. The book also includes suggestive results on different platforms, which illustrate and justify the theory and practice covered. Together, the three parts provide the essential ingredients for creating a high-performance graph application. The book ends with a section on future directions, which offers several pointers to promising topics for future research. This book is intended for new researchers as well as graduate and advanced undergraduate students. Most of the chapters can be read independently by those familiar with the basics of parallel programming and graph algorithms. However, to make the material more accessible, the book includes a brief background on elementary graph algorithms, parallel computing and GPUs. Moreover it presents a case study using Falcon, a domain-specific language for graph algorithms, to illustrate the concepts.

Graph Algorithms in the Language of Linear Algebra

Graph Algorithms in the Language of Linear Algebra PDF Author: Jeremy Kepner
Publisher: SIAM
ISBN: 9780898719918
Category : Mathematics
Languages : en
Pages : 388

Get Book Here

Book Description
The current exponential growth in graph data has forced a shift to parallel computing for executing graph algorithms. Implementing parallel graph algorithms and achieving good parallel performance have proven difficult. This book addresses these challenges by exploiting the well-known duality between a canonical representation of graphs as abstract collections of vertices and edges and a sparse adjacency matrix representation. This linear algebraic approach is widely accessible to scientists and engineers who may not be formally trained in computer science. The authors show how to leverage existing parallel matrix computation techniques and the large amount of software infrastructure that exists for these computations to implement efficient and scalable parallel graph algorithms. The benefits of this approach are reduced algorithmic complexity, ease of implementation, and improved performance.

Software Foundations for Data Interoperability and Large Scale Graph Data Analytics

Software Foundations for Data Interoperability and Large Scale Graph Data Analytics PDF Author: Lu Qin
Publisher: Springer Nature
ISBN: 3030611337
Category : Computers
Languages : en
Pages : 203

Get Book Here

Book Description
This book constitutes refereed proceedings of the 4th International Workshop on Software Foundations for Data Interoperability, SFDI 2020, and 2nd International Workshop on Large Scale Graph Data Analytics, LSGDA 2020, held in Conjunction with VLDB 2020, in September 2020. Due to the COVID-19 pandemic the conference was held online. The 11 full papers and 4 short papers were thoroughly reviewed and selected from 38 submissions. The volme presents original research and application papers on the development of novel graph analytics models, scalable graph analytics techniques and systems, data integration, and data exchange.

Large-scale Graph Analysis: System, Algorithm and Optimization

Large-scale Graph Analysis: System, Algorithm and Optimization PDF Author: Yingxia Shao
Publisher: Springer Nature
ISBN: 9811539286
Category : Computers
Languages : en
Pages : 154

Get Book Here

Book Description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms. This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.

Distributed Graph Analytics

Distributed Graph Analytics PDF Author: Unnikrishnan Cheramangalath
Publisher:
ISBN: 9783030418878
Category : Electronic books
Languages : en
Pages : 207

Get Book Here

Book Description
This book brings together two important trends: graph algorithms and high-performance computing. Efficient and scalable execution of graph processing applications in data or network analysis requires innovations at multiple levels: algorithms, associated data structures, their implementation and tuning to a particular hardware. Further, programming languages and the associated compilers play a crucial role when it comes to automating efficient code generation for various architectures. This book discusses the essentials of all these aspects. The book is divided into three parts: programming, languages, and their compilation. The first part examines the manual parallelization of graph algorithms, revealing various parallelization patterns encountered, especially when dealing with graphs. The second part uses these patterns to provide language constructs that allow a graph algorithm to be specified. Programmers can work with these language constructs without worrying about their implementation, which is the focus of the third part. Implementation is handled by a compiler, which can specialize code generation for a backend device. The book also includes suggestive results on different platforms, which illustrate and justify the theory and practice covered. Together, the three parts provide the essential ingredients for creating a high-performance graph application. The book ends with a section on future directions, which offers several pointers to promising topics for future research. This book is intended for new researchers as well as graduate and advanced undergraduate students. Most of the chapters can be read independently by those familiar with the basics of parallel programming and graph algorithms. However, to make the material more accessible, the book includes a brief background on elementary graph algorithms, parallel computing and GPUs. Moreover it presents a case study using Falcon, a domain-specific language for graph algorithms, to illustrate the concept s.

Frontiers in Massive Data Analysis

Frontiers in Massive Data Analysis PDF Author: National Research Council
Publisher: National Academies Press
ISBN: 0309287812
Category : Mathematics
Languages : en
Pages : 191

Get Book Here

Book Description
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.

Analyzing Analytics

Analyzing Analytics PDF Author: Rajesh Bordawekar
Publisher: Morgan & Claypool Publishers
ISBN: 1627058362
Category : Computers
Languages : en
Pages : 126

Get Book Here

Book Description
This book aims to achieve the following goals: (1) to provide a high-level survey of key analytics models and algorithms without going into mathematical details; (2) to analyze the usage patterns of these models; and (3) to discuss opportunities for accelerating analytics workloads using software, hardware, and system approaches. The book first describes 14 key analytics models (exemplars) that span data mining, machine learning, and data management domains. For each analytics exemplar, we summarize its computational and runtime patterns and apply the information to evaluate parallelization and acceleration alternatives for that exemplar. Using case studies from important application domains such as deep learning, text analytics, and business intelligence (BI), we demonstrate how various software and hardware acceleration strategies are implemented in practice. This book is intended for both experienced professionals and students who are interested in understanding core algorithms behind analytics workloads. It is designed to serve as a guide for addressing various open problems in accelerating analytics workloads, e.g., new architectural features for supporting analytics workloads, impact on programming models and runtime systems, and designing analytics systems.

Practical Graph Analytics with Apache Giraph

Practical Graph Analytics with Apache Giraph PDF Author: Roman Shaposhnik
Publisher: Apress
ISBN: 1484212517
Category : Computers
Languages : en
Pages : 320

Get Book Here

Book Description
Practical Graph Analytics with Apache Giraph helps you build data mining and machine learning applications using the Apache Foundation’s Giraph framework for graph processing. This is the same framework as used by Facebook, Google, and other social media analytics operations to derive business value from vast amounts of interconnected data points. Graphs arise in a wealth of data scenarios and describe the connections that are naturally formed in both digital and real worlds. Examples of such connections abound in online social networks such as Facebook and Twitter, among users who rate movies from services like Netflix and Amazon Prime, and are useful even in the context of biological networks for scientific research. Whether in the context of business or science, viewing data as connected adds value by increasing the amount of information available to be drawn from that data and put to use in generating new revenue or scientific opportunities. Apache Giraph offers a simple yet flexible programming model targeted to graph algorithms and designed to scale easily to accommodate massive amounts of data. Originally developed at Yahoo!, Giraph is now a top top-level project at the Apache Foundation, and it enlists contributors from companies such as Facebook, LinkedIn, and Twitter. Practical Graph Analytics with Apache Giraph brings the power of Apache Giraph to you, showing how to harness the power of graph processing for your own data by building sophisticated graph analytics applications using the very same framework that is relied upon by some of the largest players in the industry today.

Towards Scalable Graph Analytics on Time Dependent Graphs

Towards Scalable Graph Analytics on Time Dependent Graphs PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 11

Get Book Here

Book Description