Compiler-assisted Workload Consolidation to Efficiently Exploit Dynamic Parallelism for Recursive Applications

Compiler-assisted Workload Consolidation to Efficiently Exploit Dynamic Parallelism for Recursive Applications PDF Author: Hancheng Wu (Researcher on electrical engineering)
Publisher:
ISBN:
Category :
Languages : en
Pages : 40

Get Book Here

Book Description
GPUs have been widely used to parallelize and accelerate applications for its high throughput. Traditionally, a GPU function can only be launched from the CPU side. This results in the fact that GPUs are preferable for those application which express a flat data parallelism, a simple data parallelism that is known at compiling time and can be easily distributed to different GPU blocks and threads. However, for those applications that contain nested data parallelism, which is not known a priori and can only be discovered at running time, it is difficult to write a GPU function that achieve high performance on parallelization and acceleration. One can easily end up with either a too coarse-grained or too fine-grained GPU function. Since Kepler architecture, Nvidia introduced a new feature -- Dynamic Parallelism (DP), which enables the initiation of GPU functions from inside a GPU function. This makes the nested parallelism easy to be explored on GPU since one can program in a way that a new GPU function can be launched whenever a local nested parallelism is met during the execution. What is more, DP makes implementing recursion on GPU without the intervention of CPUs possible. Many computations exhibit a pattern of nested data parallelism and among those is parallel recursion. However, preliminary data shows that simple DP-based implementations of recursion result in poor performance. This work focus on how to efficiently exploit DP for parallel recursive applications on GPU. Specifically, the goal is to free the users from programming with the complexity of GPUs' hardware and software and to automatically generate high performance GPU recursive functions implemented with DP given the inputs of simple parallel CPU recursive functions. To this end, first, I propose several DP-based parallel recursive templates that can be generated from a serial CPU recursive function. I compare the parallel recursive templates with non DP-based counterparts (flat kernels) to see if using DP in parallel recursive application can be beneficial or not. Second, to reduce the overhead of DP, I propose compiler techniques that improve the efficiency of simple DP-based parallel recursive functions by performing workload consolidation. My evaluation shows that GPU kernels consolidated with the proposed code transformations achieve an average speedup in the order of 1500x over basic implementations using DP and an average speedup of 3.9x over optimized flat GPU kernels for both tree traversal and graph based applications.

Compiler-assisted Workload Consolidation to Efficiently Exploit Dynamic Parallelism for Recursive Applications

Compiler-assisted Workload Consolidation to Efficiently Exploit Dynamic Parallelism for Recursive Applications PDF Author: Hancheng Wu (Researcher on electrical engineering)
Publisher:
ISBN:
Category :
Languages : en
Pages : 40

Get Book Here

Book Description
GPUs have been widely used to parallelize and accelerate applications for its high throughput. Traditionally, a GPU function can only be launched from the CPU side. This results in the fact that GPUs are preferable for those application which express a flat data parallelism, a simple data parallelism that is known at compiling time and can be easily distributed to different GPU blocks and threads. However, for those applications that contain nested data parallelism, which is not known a priori and can only be discovered at running time, it is difficult to write a GPU function that achieve high performance on parallelization and acceleration. One can easily end up with either a too coarse-grained or too fine-grained GPU function. Since Kepler architecture, Nvidia introduced a new feature -- Dynamic Parallelism (DP), which enables the initiation of GPU functions from inside a GPU function. This makes the nested parallelism easy to be explored on GPU since one can program in a way that a new GPU function can be launched whenever a local nested parallelism is met during the execution. What is more, DP makes implementing recursion on GPU without the intervention of CPUs possible. Many computations exhibit a pattern of nested data parallelism and among those is parallel recursion. However, preliminary data shows that simple DP-based implementations of recursion result in poor performance. This work focus on how to efficiently exploit DP for parallel recursive applications on GPU. Specifically, the goal is to free the users from programming with the complexity of GPUs' hardware and software and to automatically generate high performance GPU recursive functions implemented with DP given the inputs of simple parallel CPU recursive functions. To this end, first, I propose several DP-based parallel recursive templates that can be generated from a serial CPU recursive function. I compare the parallel recursive templates with non DP-based counterparts (flat kernels) to see if using DP in parallel recursive application can be beneficial or not. Second, to reduce the overhead of DP, I propose compiler techniques that improve the efficiency of simple DP-based parallel recursive functions by performing workload consolidation. My evaluation shows that GPU kernels consolidated with the proposed code transformations achieve an average speedup in the order of 1500x over basic implementations using DP and an average speedup of 3.9x over optimized flat GPU kernels for both tree traversal and graph based applications.

Applications, Tools and Techniques on the Road to Exascale Computing

Applications, Tools and Techniques on the Road to Exascale Computing PDF Author: Koen de Bosschere
Publisher: IOS Press
ISBN: 1614990409
Category : Computers
Languages : en
Pages : 688

Get Book Here

Book Description
Single processing units have now reached a point where further major improvements in their performance are restricted by their physical limitations. This is causing a slowing down in advances at the same time as new scientific challenges are demanding exascale speed. This has meant that parallel processing has become key to High Performance Computing (HPC). This book contains the proceedings of the 14th biennial ParCo conference, ParCo2011, held in Ghent, Belgium. The ParCo conferences have traditionally concentrated on three main themes: Algorithms, Architectures and Applications. Nowadays though, the focus has shifted from traditional multiprocessor topologies to heterogeneous and manycores, incorporating standard CPUs, GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays). These platforms are, at a higher abstraction level, integrated in clusters, grids and clouds. The papers presented here reflect this change of focus. New architectures, programming tools and techniques are also explored, and the need for exascale hardware and software was also discussed in the industrial session of the conference.This book will be of interest to all those interested in parallel computing today, and progress towards the exascale computing of tomorrow.

Mastering Cloud Computing

Mastering Cloud Computing PDF Author: Rajkumar Buyya
Publisher: Newnes
ISBN: 0124095399
Category : Computers
Languages : en
Pages : 469

Get Book Here

Book Description
Mastering Cloud Computing is designed for undergraduate students learning to develop cloud computing applications. Tomorrow's applications won't live on a single computer but will be deployed from and reside on a virtual server, accessible anywhere, any time. Tomorrow's application developers need to understand the requirements of building apps for these virtual systems, including concurrent programming, high-performance computing, and data-intensive systems. The book introduces the principles of distributed and parallel computing underlying cloud architectures and specifically focuses on virtualization, thread programming, task programming, and map-reduce programming. There are examples demonstrating all of these and more, with exercises and labs throughout. - Explains how to make design choices and tradeoffs to consider when building applications to run in a virtual cloud environment - Real-world case studies include scientific, business, and energy-efficiency considerations

GPU Computing Gems Jade Edition

GPU Computing Gems Jade Edition PDF Author: Wen-mei Hwu
Publisher: Elsevier
ISBN: 0123859638
Category : Computers
Languages : en
Pages : 562

Get Book Here

Book Description
"Since the introduction of CUDA in 2007, more than 100 million computers with CUDA capable GPUs have been shipped to end users. GPU computing application developers can now expect their application to have a mass market. With the introduction of OpenCL in 2010, researchers can now expect to develop GPU applications that can run on hardware from multiple vendors"--

Parallel Computing

Parallel Computing PDF Author: Christian Bischof
Publisher: IOS Press
ISBN: 158603796X
Category : Computers
Languages : en
Pages : 824

Get Book Here

Book Description
ParCo2007 marks a quarter of a century of the international conferences on parallel computing that started in Berlin in 1983. The aim of the conference is to give an overview of the developments, applications and future trends in high-performance computing for various platforms.

Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing PDF Author: Leando Soares Indrusiak
Publisher: CRC Press
ISBN: 1000794385
Category : Computers
Languages : en
Pages : 177

Get Book Here

Book Description
The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: • Load and Resource Models• Admission Control• Feedback-based Allocation and Optimisation• Search-based Allocation Heuristics• Distributed Allocation based on Swarm Intelligence• Value-Based AllocationEach of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.

Optimizing Compilers for Modern Architectures: A Dependence-Based Approach

Optimizing Compilers for Modern Architectures: A Dependence-Based Approach PDF Author: Randy Allen
Publisher: Morgan Kaufmann Publishers
ISBN: 9781493303540
Category : Computers
Languages : en
Pages : 790

Get Book Here

Book Description
Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications. The approaches presented are based on research conducted over the past two decades, emphasizing the strategies implemented in research prototypes at Rice University and in several associated commercial systems. Randy Allen and Ken Kennedy have provided an indispensable resource for researchers, practicing professionals, and graduate students engaged in designing and optimizing compilers for modern computer architectures. * Offers a guide to the simple, practical algorithms and approaches that are most effective in real-world, high-performance microprocessor and parallel systems. * Demonstrates each transformation in worked examples. * Examines how two case study compilers implement the theories and practices described in each chapter. * Presents the most complete treatment of memory hierarchy issues of any compiler text. * Illustrates ordering relationships with dependence graphs throughout the book. * Applies the techniques to a variety of languages, including Fortran 77, C, hardware definition languages, Fortran 90, and High Performance Fortran. * Provides extensive references to the most sophisticated algorithms known in research.

Workload Modeling for Computer Systems Performance Evaluation

Workload Modeling for Computer Systems Performance Evaluation PDF Author: Dror G. Feitelson
Publisher: Cambridge University Press
ISBN: 1107078237
Category : Computers
Languages : en
Pages : 569

Get Book Here

Book Description
A book for experts and practitioners, emphasizing the intuition and reasoning behind definitions and derivations related to evaluating computer systems performance.

Cloud Computing and Software Services

Cloud Computing and Software Services PDF Author: Syed A. Ahson
Publisher: CRC Press
ISBN: 9781439803165
Category : Computers
Languages : en
Pages : 458

Get Book Here

Book Description
Whether you're already in the cloud, or determining whether or not it makes sense for your organization, Cloud Computing and Software Services: Theory and Techniques provides the technical understanding needed to develop and maintain state-of-the-art cloud computing and software services. From basic concepts and recent research findings to fut

Handbook of Cloud Computing

Handbook of Cloud Computing PDF Author: Nayyar Dr. Anand
Publisher: BPB Publications
ISBN: 9388511506
Category : Computers
Languages : en
Pages : 420

Get Book Here

Book Description
Great POSSIBILITIES and high future prospects to become ten times folds in the near FUTUREKey features Comprehensively gives clear picture of current state-of-the-art aspect of cloud computing by elaborating terminologies, models and other related terms. Enlightens all major players in Cloud Computing industry providing services in terms of SaaS, PaaS and IaaS. Highlights Cloud Computing Simulators, Security Aspect and Resource Allocation. In-depth presentation with well-illustrated diagrams and simple to understand technical concepts of cloud. Description The book "e;Handbook of Cloud Computing"e; provides the latest and in-depth information of this relatively new and another platform for scientific computing which has great possibilities and high future prospects to become ten folds in near future. The book covers in comprehensive manner all aspects and terminologies associated with cloud computing like SaaS, PaaS and IaaS and also elaborates almost every cloud computing service model.The book highlights several other aspects of cloud computing like Security, Resource allocation, Simulation Platforms and futuristic trend i.e. Mobile cloud computing. The book will benefit all the readers with all in-depth technical information which is required to understand current and futuristic concepts of cloud computing. No prior knowledge of cloud computing or any of its related technology is required in reading this book. What will you learn Cloud Computing, Virtualisation Software as a Service, Platform as a Service, Infrastructure as a Service Data in Cloud and its Security Cloud Computing - Simulation, Mobile Cloud Computing Specific Cloud Service Models Resource Allocation in Cloud Computing Who this book is for Students of Polytechnic Diploma Classes- Computer Science/ Information Technology Graduate Students- Computer Science/ CSE / IT/ Computer Applications Master Class Students-Msc (CS/IT)/ MCA/ M.Phil, M.Tech, M.S. Researcher's-Ph.D Research Scholars doing work in Virtualization, Cloud Computing and Cloud Security Industry Professionals- Preparing for Certifications, Implementing Cloud Computing and even working on Cloud Security Table of contents1. Introduction to Cloud Computing2. Virtualisation3. Software as a Service4. Platform as a Service5. Infrastructure as a Service6. Data in Cloud7. Cloud Security 8. Cloud Computing - Simulation9. Specific Cloud Service Models10. Resource Allocation in Cloud Computing11. Mobile Cloud Computing About the authorDr. Anand Nayyar received Ph.D (Computer Science) in Wireless Sensor Networks and Swarm Intelligence. Presently he is working in Graduate School, Duy Tan University, Da Nang, Vietnam. He has total of fourteen Years of Teaching, Research and Consultancy experience with more than 250 Research Papers in various International Conferences and highly reputed journals. He is certified Professional with more than 75 certificates and member of 50 Professional Organizations. He is acting as "e;ACM DISTINGUISHED SPEAKER"e;