Optimizations for Energy Efficiency in GPGPU Architectures

Optimizations for Energy Efficiency in GPGPU Architectures PDF Author: Alamelu Sankaranarayanan
Publisher:
ISBN: 9781339626451
Category :
Languages : en
Pages : 130

Get Book Here

Book Description
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes and textures, in real time, both in the traditional and mobile computing spaces. The computational power required to do this makes them a valuable resource to exploit for general purpose computation. In order to map programs originally designed for sequential CPUs onto massively parallel GPU architectures, it would be necessary to justify the transition with huge performance benefits. Over the last couple of years, there have been numerous proposals to improve the performance of GPUs used for general purpose computing (GPGPUs), but without much consideration for energy efficiency.

Optimizations for Energy Efficiency in GPGPU Architectures

Optimizations for Energy Efficiency in GPGPU Architectures PDF Author: Alamelu Sankaranarayanan
Publisher:
ISBN: 9781339626451
Category :
Languages : en
Pages : 130

Get Book Here

Book Description
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes and textures, in real time, both in the traditional and mobile computing spaces. The computational power required to do this makes them a valuable resource to exploit for general purpose computation. In order to map programs originally designed for sequential CPUs onto massively parallel GPU architectures, it would be necessary to justify the transition with huge performance benefits. Over the last couple of years, there have been numerous proposals to improve the performance of GPUs used for general purpose computing (GPGPUs), but without much consideration for energy efficiency.

Modeling Performance and Power for Energy-efficient GPGPU Computing

Modeling Performance and Power for Energy-efficient GPGPU Computing PDF Author: Sunpyo Hong
Publisher:
ISBN:
Category : Computer architecture
Languages : en
Pages :

Get Book Here

Book Description
The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.

Energy Efficient High Performance Processors

Energy Efficient High Performance Processors PDF Author: Jawad Haj-Yahya
Publisher: Springer
ISBN: 9811085544
Category : Technology & Engineering
Languages : en
Pages : 176

Get Book Here

Book Description
This book explores energy efficiency techniques for high-performance computing (HPC) systems using power-management methods. Adopting a step-by-step approach, it describes power-management flows, algorithms and mechanism that are employed in modern processors such as Intel Sandy Bridge, Haswell, Skylake and other architectures (e.g. ARM). Further, it includes practical examples and recent studies demonstrating how modem processors dynamically manage wide power ranges, from a few milliwatts in the lowest idle power state, to tens of watts in turbo state. Moreover, the book explains how thermal and power deliveries are managed in the context this huge power range. The book also discusses the different metrics for energy efficiency, presents several methods and applications of the power and energy estimation, and shows how by using innovative power estimation methods and new algorithms modern processors are able to optimize metrics such as power, energy, and performance. Different power estimation tools are presented, including tools that break down the power consumption of modern processors at sub-processor core/thread granularity. The book also investigates software, firmware and hardware coordination methods of reducing power consumption, for example a compiler-assisted power management method to overcome power excursions. Lastly, it examines firmware algorithms for dynamic cache resizing and dynamic voltage and frequency scaling (DVFS) for memory sub-systems.

Performance and Power Optimization of GPU Architectures for General-purpose Computing

Performance and Power Optimization of GPU Architectures for General-purpose Computing PDF Author: Yue Wang
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The other technique targets on maximizing the average throughput of all parallel processors under the dynamic power constraints. We formalize this target as a linear programming problem and solve it on the runtime. According to the simulation results, the first technique achieves more than 22% power savings with a 4% improvement in performance and the second technique saves 11% power consumption with 9% performance improvement. The contributions of this dissertation represent a significant advancement in the quest for improving performance and reducing energy consumption of GPGPU.

Algorithms and Architectures for Parallel Processing

Algorithms and Architectures for Parallel Processing PDF Author: Weizhi Meng
Publisher: Springer Nature
ISBN: 3031226771
Category : Computers
Languages : en
Pages : 818

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 22nd International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2022, which was held in October 2022. Due to COVID-19 pandemic the conference was held virtually. The 33 full papers and 10 short papers, presented were carefully reviewed and selected from 91 submissions. The papers cover many dimensions of parallel algorithms and architectures, encompassing fundamental theoretical approaches, practical experimental projects, and commercial components and systems

Algorithms and Architectures for Parallel Processing

Algorithms and Architectures for Parallel Processing PDF Author: Jesus Carretero
Publisher: Springer
ISBN: 3319495836
Category : Computers
Languages : en
Pages : 695

Get Book Here

Book Description
This book constitutes the refereed proceedings of the 16th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2016, held in Granada, Spain, in December 2016. The 30 full papers and 22 short papers presented were carefully reviewed and selected from 117 submissions. They cover many dimensions of parallel algorithms and architectures, encompassing fundamental theoretical approaches, practical experimental projects, and commercial components and systems trying to push beyond the limits of existing technologies, including experimental efforts, innovative systems, and investigations that identify weaknesses in existing parallel processing technology.

Kernel-Based Energy Optimization In GPUs

Kernel-Based Energy Optimization In GPUs PDF Author: Amin Jadidi
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Emerging GPU architectures offer a cost-effective computing platform by providing thousandsof energy-efficient compute cores and high bandwidth memory that facilitate the execution ofhighly parallel applications. In this paper, we show that different applications, and in fact differentkernels from the same application might exhibit significantly varying utilizations of computeand memory resources. However, we observe that the same kernel displays similar behaviorin its different invocations; moreover, most of the kernels are invoked many times during thecourse of execution. By exploiting these properties of kernels, in order to improve the energy efficiencyof the GPU system, we propose a dynamic resource configuration strategy that classifieskernels as compute-intensive or memory-intensive based on their resource utilizations and dynamicallyemploys memory voltage/frequency scaling or core shut-down techniques for computeandmemory-intensive kernels, respectively. This strategy uses performance and memory bandwidthutilization information from the first few invocations of a kernel to determine the optimalhardware configuration for future invocations. Experimental evaluations show that our strategysaves about 20% of total chip energy and 70% of total memory leakage power for memory andcompute-intensive kernels respectively, which are within 8% of the optimal savings that can beobtained from an oracle scheme.

Energy Efficiency and Robustness of Advanced Machine Learning Architectures

Energy Efficiency and Robustness of Advanced Machine Learning Architectures PDF Author: Alberto Marchisio
Publisher: CRC Press
ISBN: 1040165036
Category : Computers
Languages : en
Pages : 361

Get Book Here

Book Description
Machine Learning (ML) algorithms have shown a high level of accuracy, and applications are widely used in many systems and platforms. However, developing efficient ML-based systems requires addressing three problems: energy-efficiency, robustness, and techniques that typically focus on optimizing for a single objective/have a limited set of goals. This book tackles these challenges by exploiting the unique features of advanced ML models and investigates cross-layer concepts and techniques to engage both hardware and software-level methods to build robust and energy-efficient architectures for these advanced ML networks. More specifically, this book improves the energy efficiency of complex models like CapsNets, through a specialized flow of hardware-level designs and software-level optimizations exploiting the application-driven knowledge of these systems and the error tolerance through approximations and quantization. This book also improves the robustness of ML models, in particular for SNNs executed on neuromorphic hardware, due to their inherent cost-effective features. This book integrates multiple optimization objectives into specialized frameworks for jointly optimizing the robustness and energy efficiency of these systems. This is an important resource for students and researchers of computer and electrical engineering who are interested in developing energy efficient and robust ML.

Power-Efficient Computer Architectures

Power-Efficient Computer Architectures PDF Author: Magnus Själander
Publisher: Morgan & Claypool Publishers
ISBN: 1627056467
Category : Computers
Languages : en
Pages : 98

Get Book Here

Book Description
As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture. Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Specialization / Communication and Memory Systems / Conclusions / Bibliography / Authors' Biographies

Temporal Memoization for Energy-efficient Timing Error Recovery in GPGPU Architectures

Temporal Memoization for Energy-efficient Timing Error Recovery in GPGPU Architectures PDF Author: Abbas Rahimi
Publisher:
ISBN:
Category :
Languages : en
Pages : 27

Get Book Here

Book Description
Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. Therefore, it avoids redundant execution and saves energy for FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 13% {25% for a wide range of timing error rates (0% {4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 44% with 11% voltage overscaling.