Parallel Computation with Fast Algorithms for Micromagnetic Simulations on GPUs

Parallel Computation with Fast Algorithms for Micromagnetic Simulations on GPUs PDF Author: Sidi Fu
Publisher:
ISBN:
Category :
Languages : en
Pages : 159

Get Book Here

Book Description
Micromagnetics is a field of study considering the magnetization behavior in magnetic materials and devices accounting for a wide set of interactions and describing the magnetization phenomena from the atomistic scale to several hundreds of microns. Micromagnetic simulations are essential in understanding the behavior of many magnetic systems. Modeling complex structures can require a significant computational time and in some cases, the system complexity can make simulations prohibitively long or require a prohibitively large memory. In this thesis, we present a set of methods and their implementations that resulted in high-performance numerical micromagnetic tools for modeling highly complex magnetic materials and devices. The focus of the dissertation is on solving Landau-Lifshitz-Gilbert (LLG) equation efficiently, both with numerical methods and advanced hardware acceleration. To understand the numerical problem to be solved, the introduction Chapter 1 addresses the LLG equation and the governing interactions involved as well as numerical modeling basics on the Finite Difference Method (FDM) and the Finite Element Method (FEM). Chapter 1 also presents a versatile micromagnetic framework, referred to as FastMag, which implements some of these methods. Chapter 2 provides a detailed description of computing based on Graphics Processing Units (GPUs). The history of GPU programming model and the programming tips serve as the basis for understanding parallel computing on GPUs. It presents applications of GPUs on various platforms to demonstrate the current mainstream usage of GPUs and their promising future development direction. Chapter 2 also summarizes applications of GPUs in micromagnetics. Chapters 3 and 4 address two essential aspects of micromagnetic solvers: fast algorithms for computing the key interaction components and efficient time integration methods. Chapter 3 introduces a non-uniform Fourier transform (NUFFT) method, a scalar potential method, and sparse matrix-vector multiplication (SpMVM) algorithms implemented on GPUs to accelerate the magnetostatic and exchange interactions. Chapter 4 addresses basics of the time integration methods used in FastMag as well as a preconditioner to further accelerate the time integration process. Chapter 5 presents a numerical model for the current state-of-art magnetic recording system using advanced algorithms and GPU implementations described in Chapters 2-4.

Fast Algorithms and Solvers in Computational Electromagnetics and Micromagnetics on GPUs

Fast Algorithms and Solvers in Computational Electromagnetics and Micromagnetics on GPUs PDF Author: Shaojing Li
Publisher:
ISBN: 9781267685155
Category :
Languages : en
Pages : 217

Get Book Here

Book Description
In this thesis, fast algorithms for solving fields defined by the Helmholtz equation using integral equation methods are developed and implemented on Graphics Processing Units (GPUs). GPUs are massively parallel processors that offer tens or even hundreds of times of floating point computing capability to current generation CPUs. A short history of the GPUs is given and their unique architecture is described in details. On this new hardware architecture, algorithms like the hierarchical Non-uniform Grid Interpolation Method (NGIM) and the FFT-based Adaptive Integral Method (AIM) have to be significant changed from their original sequential forms to achieve high performances. Specifically, the computational domains of the problems are divided into boxes, homoge-nizing the computing burdens across the wide SIMD-style stream multiprocessors. Computing operations are reformed and reorganized to exploit the enormous floating point computing power and while at the same time to minimize the data transfer latencies. The achieved computing performance on commercial GPUs is generally two orders of magnitude higher than that on state-of-the-art CPUs and with much lower memory consumption. Based on these fast algorithms, an ultra-fast micromagnetic solver with linear or computational complexity is built. This solver, named FastMag, runs on desktop workstations with one or several GPU cards and is able to simulate magnetic systems with over one hundred million degrees of freedom. Electromagnetic solvers that use slightly different algorithms are also implemented and provide impressive performance on general electromagnetic problems such as wave scattering. This electromagnetic solver is also capable of handling periodic boundary problems using a new algorithm called the Fast Periodic Interpolation Method (FPIM). This algorithm significantly uses spatial interpolations as well as the FFT to reduce the time of evaluating fields generated by infinitely periodic structures. Using previously developed micromagnetic solvers, the author investigated two novel magnetic recording systems that might be useful in the next generation ultra-high density magnetic recording. The capped bit-patterned media (CBPM) are proposed to have lower reversal fields, lower switching field distribution as well as better readback signals. The reversal mechanisms of bit-patterned media under the influence of microwaves are also investigated. This leads to the proposed multi-layer recording system using the microwave-assisted magnetic recording (MAMR) technology.

GPU Computing Gems Emerald Edition

GPU Computing Gems Emerald Edition PDF Author:
Publisher: Elsevier
ISBN: 0123849896
Category : Computers
Languages : en
Pages : 889

Get Book Here

Book Description
GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing. This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use. Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website: ..." - Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more - Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution - Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use

CUDA Programming

CUDA Programming PDF Author: Shane Cook
Publisher: Newnes
ISBN: 0124159885
Category : Computers
Languages : en
Pages : 591

Get Book Here

Book Description
If you need to learn CUDA but don't have experience with parallel computing, CUDA Programming: A Developer's Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems. - Comprehensive introduction to parallel programming with CUDA, for readers new to both - Detailed instructions help readers optimize the CUDA software development kit - Practical techniques illustrate working with memory, threads, algorithms, resources, and more - Covers CUDA on multiple hardware platforms: Mac, Linux and Windows with several NVIDIA chipsets - Each chapter includes exercises to test reader knowledge

Massively Parallel Evolutionary Computation on GPGPUs

Massively Parallel Evolutionary Computation on GPGPUs PDF Author: Shigeyoshi Tsutsui
Publisher: Springer Science & Business Media
ISBN: 3642379591
Category : Computers
Languages : en
Pages : 454

Get Book Here

Book Description
Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened up this possibility for parallel EAs, and this is the first book dedicated to this exciting development. The three chapters of Part I are tutorials, representing a comprehensive introduction to the approach, explaining the characteristics of the hardware used, and presenting a representative project to develop a platform for automatic parallelization of evolutionary computing (EC) on GPGPUs. The 10 chapters in Part II focus on how to consider key EC approaches in the light of this advanced computational technique, in particular addressing generic local search, tabu search, genetic algorithms, differential evolution, swarm optimization, ant colony optimization, systolic genetic search, genetic programming, and multiobjective optimization. The 6 chapters in Part III present successful results from real-world problems in data mining, bioinformatics, drug discovery, crystallography, artificial chemistries, and sudoku. Although the parallelism of EAs is suited to the single-instruction multiple-data (SIMD)-based GPU, there are many issues to be resolved in design and implementation, and a key feature of the contributions is the practical engineering advice offered. This book will be of value to researchers, practitioners, and graduate students in the areas of evolutionary computation and scientific computing.

Applied Parallel and Scientific Computing

Applied Parallel and Scientific Computing PDF Author: Kristján Jónasson
Publisher: Springer Science & Business Media
ISBN: 3642281508
Category : Computers
Languages : en
Pages : 364

Get Book Here

Book Description
The two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10th International Conference on Applied Parallel and Scientific Computing, PARA 2010, held in Reykjavík, Iceland, in June 2010. These volumes contain three keynote lectures, 29 revised papers and 45 minisymposia presentations arranged on the following topics: cloud computing, HPC algorithms, HPC programming tools, HPC in meteorology, parallel numerical algorithms, parallel computing in physics, scientific computing tools, HPC software engineering, simulations of atomic scale systems, tools and environments for accelerator based computational biomedicine, GPU computing, high performance computing interval methods, real-time access and processing of large data sets, linear algebra algorithms and software for multicore and hybrid architectures in honor of Fred Gustavson on his 75th birthday, memory and multicore issues in scientific computing - theory and praxis, multicore algorithms and implementations for application problems, fast PDE solvers and a posteriori error estimates, and scalable tools for high performance computing.

Advanced Optimization Techniques for MT Simulation on GPUs

Advanced Optimization Techniques for MT Simulation on GPUs PDF Author: Eyad Hailat
Publisher: LAP Lambert Academic Publishing
ISBN: 9783659554636
Category :
Languages : en
Pages : 132

Get Book Here

Book Description
The objective of this work is to design and implement a self-adaptive parallel GPU optimized Monte Carlo algorithm for the simulation of adsorption in porous materials. We focus on Nvidia's GPUs and CUDA's Fermi architecture specifically. The resulting package supports the different ensemble methods for the Monte Carlo simulation, which will allow for the simulation of multi-component adsorption in porous solids. Such an algorithm will have broad applications to the development of novel porous materials for the sequestration of CO2 and the filtration of toxic industrial chemicals. The primary objective of this work is the release of a massively parallel open source Monte Carlo simulation engine implemented using GPUs, called GOMC. The code will utilize the canonical ensemble, and the Gibbs ensemble method, which will allow for the simulation of multiple phenomena, including liquid-vapor phase coexistence, and single and multi-component adsorption in porous materials. In addition, the grand canonical ensemble and the configurational-bias algorithms have been implemented so that polymeric materials and small proteins may be simulated.

Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit

Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit PDF Author: Qi Li
Publisher:
ISBN:
Category : Machine learning
Languages : en
Pages :

Get Book Here

Book Description
This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (GPU) in order to solve machine learning problems for large datasets. In particular, it contributes to the development of fast GPU based algorithms for calculating distance (i.e. similarity, affinity, closeness) matrix. It also presents the algorithm and implementation of a fast parallel Support Vector Machine (SVM) using GPU. These application tools are developed using Compute Unified Device Architecture (CUDA), which is a popular software framework for General Purpose Computing using GPU (GPGPU). Distance calculation is the core part of all machine learning algorithms because the closer the query is to some samples (i.e. observations, records, entries), the more likely the query belongs to the class of those samples. K-Nearest Neighbors Search (k-NNS) is a popular and powerful distance based tool for solving classification problem. It is the prerequisite for training local model based classifiers. Fast distance calculation can significantly improve the speed performance of these classifiers and GPUs can be very handy for their accelerations. Meanwhile, several GPU based sorting algorithms are also included to sort the distance matrix and seek for the k-nearest neighbors. The speed performances of the sorting algorithms vary depending upon the input sequences. The GPUKNN proposed in this dissertation utilizes the GPU based distance computation algorithm and automatically picks up the most suitable sorting algorithm according to the characteristics of the input datasets. Every machine learning tool has its own pros and cons. The advantage of SVM is the high classification accuracy. This makes SVM possibly the best classification tool. However, as in many other machine learning algorithms, SVM's slow training phase slows down when the size of the input datasets increase. The GPU version of parallel SVM based on parallel Sequential Minimal Optimization (SMO) implemented in this dissertation is proposed to reduce the time cost in both training and predicting phases. This implementation of GPUSVM is original. It utilizes many parallel processing techniques to accelerate and minimize the computations of kernel evaluation, which are considered as the most time consuming operations in SVM. Although the many-core architecture of GPU performs the best in data level parallelism, multi-task (aka. task level parallelism) processing is also integrated into the application to improve the speed performance of tasks such as multiclass classification and cross-validation. Furthermore, the procedure of finding worst violators is distributed to multiple blocks on the CUDA model. This reduces the time cost for each iteration of SMO during the training phase. All of these violators are shared among different tasks in multiclass classification and cross-validation to reduce the duplicate kernel computations. The speed performance results have shown that the achieved speedup of both the training phase and predicting phase are ranging from one order of magnitude to three orders of magnitude times faster compared to the state of the art LIBSVM software on some well known benchmarking datasets.

Programming Massively Parallel Processors

Programming Massively Parallel Processors PDF Author: David B. Kirk
Publisher: Newnes
ISBN: 0123914183
Category : Computers
Languages : en
Pages : 519

Get Book Here

Book Description
Programming Massively Parallel Processors: A Hands-on Approach, Second Edition, teaches students how to program massively parallel processors. It offers a detailed discussion of various techniques for constructing parallel programs. Case studies are used to demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. This guide shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in depth. This revised edition contains more parallel programming examples, commonly-used libraries such as Thrust, and explanations of the latest tools. It also provides new coverage of CUDA 5.0, improved performance, enhanced development tools, increased hardware support, and more; increased coverage of related technology, OpenCL and new material on algorithm patterns, GPU clusters, host programming, and data parallelism; and two new case studies (on MRI reconstruction and molecular visualization) that explore the latest applications of CUDA and GPUs for scientific research and high-performance computing. This book should be a valuable resource for advanced students, software engineers, programmers, and hardware engineers. - New coverage of CUDA 5.0, improved performance, enhanced development tools, increased hardware support, and more - Increased coverage of related technology, OpenCL and new material on algorithm patterns, GPU clusters, host programming, and data parallelism - Two new case studies (on MRI reconstruction and molecular visualization) explore the latest applications of CUDA and GPUs for scientific research and high-performance computing

Accelerating the Adaptive Tempering Monte Carlo Method with CUDA Graphics Processing Units

Accelerating the Adaptive Tempering Monte Carlo Method with CUDA Graphics Processing Units PDF Author: Clifford T. Hall
Publisher:
ISBN:
Category : Conjugated polymers
Languages : en
Pages : 117

Get Book Here

Book Description
Molecular Dynamics (MD) has been and continues to be a popular method of molecular simulation because it is easily parallelizable. Parallel programming has become less burdensome for the science community, and competition in MD algorithm development has given MD avant-garde positions in molecular, bio-systems, materials, and nano-systems simulation. In contrast, inherently serial Monte Carlo (MC) methods have been largely ignored in the recent advancements of parallel computing technology. The trend exists even though MC methods based on statistical mechanics principles are superior for studying thermodynamics properties such as entropy and free energy. In my dissertation I present a means of parallelizing MC molecular simulation such that in time the popularity of MC may be restored to that of MD. The Adaptive Tempering Monte Carlo method (ATMC) employs the Metropolis MC (MMC) sampling criterion; therefore, both ATMC and MMC are inherently serial algorithms. ATMC is a multicanonical ensemble algorithm that optimizes system configuration by searching for the most ordered state. This algorithm was developed by Dong and Blaisten-Barojas in 2006. My algorithm accelerates ATMC and MMC in a novel implementation exploiting state of the art parallel processing technology, namely NVIDIA℗ʼ Compute Unified Device Architecture (CUDA) Graphics Processing Units (GPUs). My implementation source code is written in CUDA C, NVIDIA's extension to the C programming language for parallel programming, and summarily compiled by NVCC, NVIDIA's CUDA version 4.0 C compiler. My CUDA GPU-accelerated implementation is verified against a 2010 study by Dai and Blaisten-Barojas of pyrrole oligomers (specifically, 12-Py chains), an interesting material for its application in artificial muscles, actuators, chemical remediation, among others. This previous study put forward a partially coarse-grained model potential for reduced pyrrole oligomers at the polypyrrole experimental density. I introduced a revision to this potential model apropos for condensed phases of oligopyrroles. Verification includes comparison of total potential energy, intra-oligomer energy, inter-oligomer energy, end-to-end distance, radius of gyration, and two order parameters that characterize the chain ordering in the condensed phase. Bending and dihedral angles are also examined. In addition, I performed a benchmark of my accelerated algorithms that show a speed-up factor greater than 60 with respect to the implementation in CPU. This extremely fast implementation is reached for systems larger than about 250,000 pyrrole monomers. Speed-ups in this range are unique in the published literature. A journal article is in preparation to report this achievement. My novel accelerated implementation has already been applied in a study of oxidized oligopyrrole. A contributed presentation was presented at the American Physical Society March Meeting in Baltimore, March 2013 and is soon to be published in a physical chemistry journal.