Modeling Performance and Power for Energy-efficient GPGPU Computing

Modeling Performance and Power for Energy-efficient GPGPU Computing PDF Author: Sunpyo Hong
Publisher:
ISBN:
Category : Computer architecture
Languages : en
Pages :

Get Book Here

Book Description
The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.

Modeling Performance and Power for Energy-efficient GPGPU Computing

Modeling Performance and Power for Energy-efficient GPGPU Computing PDF Author: Sunpyo Hong
Publisher:
ISBN:
Category : Computer architecture
Languages : en
Pages :

Get Book Here

Book Description
The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.

Performance and Power Modeling of GPU Systems with Dynamic Voltage and Frequency Scaling

Performance and Power Modeling of GPU Systems with Dynamic Voltage and Frequency Scaling PDF Author: Qiang Wang
Publisher:
ISBN:
Category : Computer systems
Languages : en
Pages : 141

Get Book Here

Book Description
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it

GPU Energy Modeling and Analysis

GPU Energy Modeling and Analysis PDF Author: Zain Asgar
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Over the past couple of decades GPUs have enjoyed tremendous scaling in both functionality and performance by focusing on area efficient processing. However, the slowdown in supply voltage scaling has created a new hurdle to continued scaling of GPU performance. This slowdown in voltage scaling has caused power consumption to limit the achievable GPU performance. Since GPUs currently use many of the well-known hardware techniques for reduced power consumption, GPU designers need to start looking at architectural techniques to improve energy efficiency. To enable this exploration, we create an accurate power model of GPU architectures and apply this model to explore a couple of methods to save power. As part of these studies we will look at overdraw (which occurs when a given pixel's value is computed more than once) and thread level redundancy in the shader processor of the GPU. Through the use of our model and GPU performance data, we will show that significant opportunities exist for improving energy efficiency. These studies demonstrate both the utility of our power model, and the potential of architectural changes to make GPUs more energy efficient.

Energy Efficient High Performance Processors

Energy Efficient High Performance Processors PDF Author: Jawad Haj-Yahya
Publisher: Springer
ISBN: 9811085544
Category : Technology & Engineering
Languages : en
Pages : 176

Get Book Here

Book Description
This book explores energy efficiency techniques for high-performance computing (HPC) systems using power-management methods. Adopting a step-by-step approach, it describes power-management flows, algorithms and mechanism that are employed in modern processors such as Intel Sandy Bridge, Haswell, Skylake and other architectures (e.g. ARM). Further, it includes practical examples and recent studies demonstrating how modem processors dynamically manage wide power ranges, from a few milliwatts in the lowest idle power state, to tens of watts in turbo state. Moreover, the book explains how thermal and power deliveries are managed in the context this huge power range. The book also discusses the different metrics for energy efficiency, presents several methods and applications of the power and energy estimation, and shows how by using innovative power estimation methods and new algorithms modern processors are able to optimize metrics such as power, energy, and performance. Different power estimation tools are presented, including tools that break down the power consumption of modern processors at sub-processor core/thread granularity. The book also investigates software, firmware and hardware coordination methods of reducing power consumption, for example a compiler-assisted power management method to overcome power excursions. Lastly, it examines firmware algorithms for dynamic cache resizing and dynamic voltage and frequency scaling (DVFS) for memory sub-systems.

Energy-Efficient Computing and Data Centers

Energy-Efficient Computing and Data Centers PDF Author: Luigi Brochard
Publisher: John Wiley & Sons
ISBN: 1119648807
Category : Computers
Languages : en
Pages : 234

Get Book Here

Book Description
Data centers consume roughly 1% of the total electricity demand, while ICT as a whole consumes around 10%. Demand is growing exponentially and, left unchecked, will grow to an estimated increase of 20% or more by 2030. This book covers the energy consumption and minimization of the different data center components when running real workloads, taking into account the types of instructions executed by the servers. It presents the different air- and liquid-cooled technologies for servers and data centers with some real examples, including waste heat reuse through adsorption chillers, as well as the hardware and software used to measure, model and control energy. It computes and compares the Power Usage Effectiveness and the Total Cost of Ownership of new and existing data centers with different cooling designs, including free cooling and waste heat reuse leading to the Energy Reuse Effectiveness. The book concludes by demonstrating how a well-designed data center reusing waste heat to produce chilled water can reduce energy consumption by roughly 50%, and how renewable energy can be used to create net-zero energy data centers.

Predictive Modeling and Optimization for Energy-efficient GPU Computing

Predictive Modeling and Optimization for Energy-efficient GPU Computing PDF Author: Kaijie Fan
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description


High Performance Computing

High Performance Computing PDF Author: Michela Taufer
Publisher: Springer
ISBN: 331946079X
Category : Computers
Languages : en
Pages : 710

Get Book Here

Book Description
This book constitutes revised selected papers from 7 workshops that were held in conjunction with the ISC High Performance 2016 conference in Frankfurt, Germany, in June 2016. The 45 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They stem from the following workshops: Workshop on Exascale Multi/Many Core Computing Systems, E-MuCoCoS; Second International Workshop on Communication Architectures at Extreme Scale, ExaComm; HPC I/O in the Data Center Workshop, HPC-IODC; International Workshop on OpenPOWER for HPC, IWOPH; Workshop on the Application Performance on Intel Xeon Phi – Being Prepared for KNL and Beyond, IXPUG; Workshop on Performance and Scalability of Storage Systems, WOPSSS; and International Workshop on Performance Portable Programming Models for Accelerators, P3MA.

Performance and Power Optimization of GPU Architectures for General-purpose Computing

Performance and Power Optimization of GPU Architectures for General-purpose Computing PDF Author: Yue Wang
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The other technique targets on maximizing the average throughput of all parallel processors under the dynamic power constraints. We formalize this target as a linear programming problem and solve it on the runtime. According to the simulation results, the first technique achieves more than 22% power savings with a 4% improvement in performance and the second technique saves 11% power consumption with 9% performance improvement. The contributions of this dissertation represent a significant advancement in the quest for improving performance and reducing energy consumption of GPGPU.

General-Purpose Graphics Processor Architectures

General-Purpose Graphics Processor Architectures PDF Author: Tor M. Aamodt
Publisher: Morgan & Claypool Publishers
ISBN: 1627056181
Category : Computers
Languages : en
Pages : 142

Get Book Here

Book Description
Originally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain-specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources. The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters \ref{ch03} and \ref{ch04} provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs.

High Performance Computing

High Performance Computing PDF Author: Esteban Meneses
Publisher: Springer
ISBN: 3030162052
Category : Computers
Languages : en
Pages : 338

Get Book Here

Book Description
This book constitutes the proceedings of the 5th Latin American Conference, CARLA 2018, held in Bucaramanga, Colombia, in September 2018. The 24 papers presented in this volume were carefully reviewed and selected from 38 submissions. They are organized in topical sections on: Artificial Intelligence; Accelerators; Applications; Performance Evaluation; Platforms and Infrastructures; Cloud Computing.