Characterizing and Improving Graph Algorithm Performance on Multicore Systems

Characterizing and Improving Graph Algorithm Performance on Multicore Systems PDF Author: Nicole Celeste Rodia
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The rise of big data analytics has contributed to the growing popularity and scale of graph datasets, positioning graph analysis as an important research area. Graph analysis is an essential tool in many domains, including the physical and social sciences, healthcare, business intelligence, and cybersecurity. The increasing scale of graph analysis problems, with graphs containing millions or billions of vertices and edges, has made parallel and distributed graph algorithms essential for effective analysis of these large datasets. At the same time, modern multicore systems have been scaling to higher core counts, with dozens of complex cores in a single system. At first glance, it would seem that graph algorithms can leverage data-level parallelism across graph vertices and edges to utilize this large number of cores to quickly process large datasets. In fact, on multicore systems, graph algorithms are typically inefficient and perform poorly. The real-world informatics graphs used for today's big data analytics are derived from online social networks, web page links, genomics data, and the like. These networks possess fundamental properties that differ from traditional graphs like trees or meshes, resulting in different execution characteristics. We study the factors behind this lack of performance and demonstrate software and hardware techniques that improve performance. First, we analyze the perfor- mance characteristics of a core set of graph analysis algorithms across several infor- matics, physical, and synthetic graph datasets using a multicore microarchitectural simulator. Our characterization indicates that poor performance is due to several fac- tors, including irregular data access patterns, load imbalance, high communication- to-computation ratio, and ineffective caching techniques. To investigate the potential for caching to improve graph algorithm performance, we study the algorithms' data locality. Cache miss rates are an unreliable metric for data locality because they are heavily influenced by dataset size, cache size, and replacement policy. Thus, we use cache-independent locality analysis techniques, including reuse distance and a probability-based locality score, to analyze data locality in graph algorithms. Based on our analysis of data locality, we find that LRU-based cache replacement policies do not provide good performance for the data access patterns characteristic of graph algorithms. Further, we show that data access patterns correlate with algorithm characteristics, graph dataset structure, and vertex degree. These insights indicate that utilization of algorithm- and dataset-specific locality information paired with an improved cache replacement policy could significantly improve graph algorithm performance. Second, we employ our knowledge of real-world graph properties to redesign the algorithm for detecting strongly connected components (SCCs) in a directed graph, a fundamental graph analysis algorithm used in many scientific and engineering do- mains. Traditional approaches in parallel SCC detection show limited performance and poor scaling behavior when applied to large real-world graph instances. We investigate the shortcomings of the conventional approach and propose a series of ex- tensions that account for the fundamental properties of real-world graphs, particularly the small-world property. Our scalable implementation offers excellent performance on diverse small-world graphs resulting in a factor of 5 to 29 times parallel speedup over an optimal sequential algorithm on 16 cores and 32 hardware threads. Third, we propose a new cache replacement policy based on our observations of data locality in graph algorithms. The Graph Priority Insertion Policy (GPIP) uses per-data-structure software priority hints to improve last-level cache hit rates by maintaining data with higher locality in the cache. This policy provides an average reduction in misses per thousand instruction (MPKI) of 3% over least-recently used (LRU) replacement. Overall, our contributions serve to expand understanding of the characteristics of graph algorithms and improve graph algorithm performance through both software and hardware means.

Characterizing and Improving Graph Algorithm Performance on Multicore Systems

Characterizing and Improving Graph Algorithm Performance on Multicore Systems PDF Author: Nicole Celeste Rodia
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The rise of big data analytics has contributed to the growing popularity and scale of graph datasets, positioning graph analysis as an important research area. Graph analysis is an essential tool in many domains, including the physical and social sciences, healthcare, business intelligence, and cybersecurity. The increasing scale of graph analysis problems, with graphs containing millions or billions of vertices and edges, has made parallel and distributed graph algorithms essential for effective analysis of these large datasets. At the same time, modern multicore systems have been scaling to higher core counts, with dozens of complex cores in a single system. At first glance, it would seem that graph algorithms can leverage data-level parallelism across graph vertices and edges to utilize this large number of cores to quickly process large datasets. In fact, on multicore systems, graph algorithms are typically inefficient and perform poorly. The real-world informatics graphs used for today's big data analytics are derived from online social networks, web page links, genomics data, and the like. These networks possess fundamental properties that differ from traditional graphs like trees or meshes, resulting in different execution characteristics. We study the factors behind this lack of performance and demonstrate software and hardware techniques that improve performance. First, we analyze the perfor- mance characteristics of a core set of graph analysis algorithms across several infor- matics, physical, and synthetic graph datasets using a multicore microarchitectural simulator. Our characterization indicates that poor performance is due to several fac- tors, including irregular data access patterns, load imbalance, high communication- to-computation ratio, and ineffective caching techniques. To investigate the potential for caching to improve graph algorithm performance, we study the algorithms' data locality. Cache miss rates are an unreliable metric for data locality because they are heavily influenced by dataset size, cache size, and replacement policy. Thus, we use cache-independent locality analysis techniques, including reuse distance and a probability-based locality score, to analyze data locality in graph algorithms. Based on our analysis of data locality, we find that LRU-based cache replacement policies do not provide good performance for the data access patterns characteristic of graph algorithms. Further, we show that data access patterns correlate with algorithm characteristics, graph dataset structure, and vertex degree. These insights indicate that utilization of algorithm- and dataset-specific locality information paired with an improved cache replacement policy could significantly improve graph algorithm performance. Second, we employ our knowledge of real-world graph properties to redesign the algorithm for detecting strongly connected components (SCCs) in a directed graph, a fundamental graph analysis algorithm used in many scientific and engineering do- mains. Traditional approaches in parallel SCC detection show limited performance and poor scaling behavior when applied to large real-world graph instances. We investigate the shortcomings of the conventional approach and propose a series of ex- tensions that account for the fundamental properties of real-world graphs, particularly the small-world property. Our scalable implementation offers excellent performance on diverse small-world graphs resulting in a factor of 5 to 29 times parallel speedup over an optimal sequential algorithm on 16 cores and 32 hardware threads. Third, we propose a new cache replacement policy based on our observations of data locality in graph algorithms. The Graph Priority Insertion Policy (GPIP) uses per-data-structure software priority hints to improve last-level cache hit rates by maintaining data with higher locality in the cache. This policy provides an average reduction in misses per thousand instruction (MPKI) of 3% over least-recently used (LRU) replacement. Overall, our contributions serve to expand understanding of the characteristics of graph algorithms and improve graph algorithm performance through both software and hardware means.

Understanding and Improving Graph Algorithm Performance

Understanding and Improving Graph Algorithm Performance PDF Author: Scott Beamer
Publisher:
ISBN:
Category :
Languages : en
Pages : 151

Get Book Here

Book Description
Graph processing is experiencing a surge of renewed interest as applications in social networks and their analysis have grown in importance. Additionally, graph algorithms have found new applications in speech recognition and the sciences. In order to deliver the full potential of these emerging applications, graph processing must become substantially more efficient, as graph processing's communication-intensive nature often results in low arithmetic intensity that underutilizes available hardware platforms. To improve graph algorithm performance, this dissertation characterizes graph processing workloads on shared memory multiprocessors in order to understand graph algorithm performance. By querying performance counters to measure utilizations on real hardware, we find that contrary to prevailing wisdom, caches provide great benefit for graph processing and the systems are rarely memory bandwidth bound. Leveraging the insights of our workload characterization, we introduce the Graph Algorithm Iron Law (GAIL), a simple performance model that allows for reasoning about tradeoffs across layers by considering algorithmic efficiency, cache locality, and memory bandwidth utilization. We also provide the Graph Algorithm Platform (GAP) Benchmark Suite to help the community improve graph processing evaluations through standardization. In addition to understanding graph algorithm performance, we make contributions to improve graph algorithm performance. We present our direction-optimizing breadth-first search algorithm that is advantageous for low-diameter graphs, which are becoming increasingly relevant as social network analysis becomes more prevalent. Finally, we introduce propagation blocking, a technique to reduce memory communication on cache-based systems by blocking graph computations in order to improve spatial locality.

Large-scale Graph Analysis: System, Algorithm and Optimization

Large-scale Graph Analysis: System, Algorithm and Optimization PDF Author: Yingxia Shao
Publisher: Springer Nature
ISBN: 9811539286
Category : Computers
Languages : en
Pages : 154

Get Book Here

Book Description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms. This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.

Algorithm Design on Multicore Processors for Massive-data Analysis

Algorithm Design on Multicore Processors for Massive-data Analysis PDF Author: Virat Agarwal
Publisher:
ISBN:
Category : Algorithms
Languages : en
Pages :

Get Book Here

Book Description
Analyzing massive-data sets and streams is computationally very challenging. Data sets in systems biology, network analysis and security use network abstraction to construct large-scale graphs. Graph algorithms such as traversal and search are memory-intensive and typically require very little computation, with access patterns that are irregular and fine-grained. The increasing streaming data rates in various domains such as security, mining, and finance leaves algorithm designers with only a handful of clock cycles (with current general purpose computing technology) to process every incoming byte of data in-core at real-time. This along with increasing complexity of mining patterns and other analytics puts further pressure on already high computational requirement. Processing streaming data in finance comes with an additional constraint to process at low latency, that restricts the algorithm to use common techniques such as batching to obtain high throughput. The primary contributions of this dissertation are the design of novel parallel data analysis algorithms for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive streaming data, financial market data feed processing and analytics, and data transformation, that capture the machine-independent aspects, to guarantee portability with performance to future processors, with high performance implementations on multicore processors that embed processorspecific optimizations. Our breadth first search graph traversal algorithm demonstrates a capability to process massive graphs with billions of vertices and edges on commodity multicore processors at rates that are competitive with supercomputing results in the recent literature. We also present high performance scalable keyword scanning on streaming data using novel automata compression algorithm, a model of computation based on small software content addressable memories (CAMs) and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level algorithmic approach to process financial feeds we present a solution that decodes and normalizes option market data at rates an order of magnitude more than the current needs of the market, yet portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm design challenges to process massive-data and present solutions and techniques that we believe can be used and extended to solve future research problems in this domain.

Performance Analysis and Tuning on Modern CPUs

Performance Analysis and Tuning on Modern CPUs PDF Author:
Publisher: Independently Published
ISBN:
Category :
Languages : en
Pages : 238

Get Book Here

Book Description
Performance tuning is becoming more important than it has been for the last 40 years. Read this book to understand your application's performance that runs on a modern CPU and learn how you can improve it. The 170+ page guide combines the knowledge of many optimization experts from different industries.

Graph Algorithms

Graph Algorithms PDF Author: Mark Needham
Publisher: "O'Reilly Media, Inc."
ISBN: 1492047635
Category : Computers
Languages : en
Pages : 297

Get Book Here

Book Description
Discover how graph algorithms can help you leverage the relationships within your data to develop more intelligent solutions and enhance your machine learning models. You’ll learn how graph analytics are uniquely suited to unfold complex structures and reveal difficult-to-find patterns lurking in your data. Whether you are trying to build dynamic network models or forecast real-world behavior, this book illustrates how graph algorithms deliver value—from finding vulnerabilities and bottlenecks to detecting communities and improving machine learning predictions. This practical book walks you through hands-on examples of how to use graph algorithms in Apache Spark and Neo4j—two of the most common choices for graph analytics. Also included: sample code and tips for over 20 practical graph algorithms that cover optimal pathfinding, importance through centrality, and community detection. Learn how graph analytics vary from conventional statistical analysis Understand how classic graph algorithms work, and how they are applied Get guidance on which algorithms to use for different types of questions Explore algorithm examples with working code and sample datasets from Spark and Neo4j See how connected feature extraction can increase machine learning accuracy and precision Walk through creating an ML workflow for link prediction combining Neo4j and Spark

Software Development for Embedded Multi-core Systems

Software Development for Embedded Multi-core Systems PDF Author: Max Domeika
Publisher: Newnes
ISBN: 0080558585
Category : Technology & Engineering
Languages : en
Pages : 435

Get Book Here

Book Description
The multicore revolution has reached the deployment stage in embedded systems ranging from small ultramobile devices to large telecommunication servers. The transition from single to multicore processors, motivated by the need to increase performance while conserving power, has placed great responsibility on the shoulders of software engineers. In this new embedded multicore era, the toughest task is the development of code to support more sophisticated systems. This book provides embedded engineers with solid grounding in the skills required to develop software targeting multicore processors. Within the text, the author undertakes an in-depth exploration of performance analysis, and a close-up look at the tools of the trade. Both general multicore design principles and processor-specific optimization techniques are revealed. Detailed coverage of critical issues for multicore employment within embedded systems is provided, including the Threading Development Cycle, with discussions of analysis, design, development, debugging, and performance tuning of threaded applications. Software development techniques engendering optimal mobility and energy efficiency are highlighted through multiple case studies, which provide practical “how-to advice on implementing the latest multicore processors. Finally, future trends are discussed, including terascale, speculative multithreading, transactional memory, interconnects, and the software-specific implications of these looming architectural developments. This is the only book to explain software optimization for embedded multi-core systems Helpful tips, tricks and design secrets from an Intel programming expert, with detailed examples using the popular X86 architecture Covers hot topics, including ultramobile devices, low-power designs, Pthreads vs. OpenMP, and heterogeneous cores

Frontier Computing

Frontier Computing PDF Author: Jason C. Hung
Publisher: Springer Nature
ISBN: 9819914280
Category : Technology & Engineering
Languages : en
Pages : 2016

Get Book Here

Book Description
This book gathers the proceedings of the 12th International Conference on Frontier Computing, held in Tokyo, Japan, on July 12–15, 2022, and provides comprehensive coverage of the latest advances and trends in information technology, science, and engineering. It addresses a number of broad themes, including communication networks, business intelligence and knowledge management, Web intelligence, and related fields that inspire the development of information technology. The respective contributions cover a wide range of topics: database and data mining, networking and communications, Web and Internet of things, embedded systems, soft computing, social network analysis, security and privacy, optical communication, and ubiquitous/pervasive computing. Many of the papers outline promising future research directions, and the book benefits students, researchers, and professionals alike. Further, it offers a useful reference guide for newcomers to the field.

Internet and Distributed Computing Systems

Internet and Distributed Computing Systems PDF Author: Wenfeng Li
Publisher: Springer
ISBN: 3319459406
Category : Computers
Languages : en
Pages : 531

Get Book Here

Book Description
This book constitutes the proceedings of the 9th International Conference on Internet and Distributed Computing Systems, IDCS 2016, held in Wuhan, China, in September 2016. The 30 full papers and 18 short papers presented in this volume were carefully reviewed and selected from 78 submissions. They were organized in topical sections named: body sensor networks and wearable devices; cloud computing and networking; distributed computing and big data; distributed scheduling and optimization; internet of things and its application; smart networked transportation and logistics; and big data and social networks.

Algorithms for Sparse Linear Systems

Algorithms for Sparse Linear Systems PDF Author: Jennifer Scott
Publisher: Springer Nature
ISBN: 3031258207
Category : Mathematics
Languages : en
Pages : 254

Get Book Here

Book Description
Large sparse linear systems of equations are ubiquitous in science, engineering and beyond. This open access monograph focuses on factorization algorithms for solving such systems. It presents classical techniques for complete factorizations that are used in sparse direct methods and discusses the computation of approximate direct and inverse factorizations that are key to constructing general-purpose algebraic preconditioners for iterative solvers. A unified framework is used that emphasizes the underlying sparsity structures and highlights the importance of understanding sparse direct methods when developing algebraic preconditioners. Theoretical results are complemented by sparse matrix algorithm outlines. This monograph is aimed at students of applied mathematics and scientific computing, as well as computational scientists and software developers who are interested in understanding the theory and algorithms needed to tackle sparse systems. It is assumed that the reader has completed a basic course in linear algebra and numerical mathematics.