Efficient Fault Tolerance for Selected Scientific Computing Algorithms on Heterogeneous and Approximate Computer Architectures

Efficient Fault Tolerance for Selected Scientific Computing Algorithms on Heterogeneous and Approximate Computer Architectures PDF Author: Alexander Schöll
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Fault-Tolerance Techniques for High-Performance Computing

Fault-Tolerance Techniques for High-Performance Computing PDF Author: Thomas Herault
Publisher: Springer
ISBN: 3319209434
Category : Computers
Languages : en
Pages : 325

Get Book Here

Book Description
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

The Evolution of Fault-Tolerant Computing

The Evolution of Fault-Tolerant Computing PDF Author: A. Avizienis
Publisher: Springer Science & Business Media
ISBN: 3709188717
Category : Computers
Languages : en
Pages : 467

Get Book Here

Book Description
For the editors of this book, as well as for many other researchers in the area of fault-tolerant computing, Dr. William Caswell Carter is one of the key figures in the formation and development of this important field. We felt that the IFIP Working Group 10.4 at Baden, Austria, in June 1986, which coincided with an important step in Bill's career, was an appropriate occasion to honor Bill's contributions and achievements by organizing a one day "Symposium on the Evolution of Fault-Tolerant Computing" in the honor of William C. Carter. The Symposium, held on June 30, 1986, brought together a group of eminent scientists from all over the world to discuss the evolu tion, the state of the art, and the future perspectives of the field of fault-tolerant computing. Historic developments in academia and industry were presented by individuals who themselves have actively been involved in bringing them about. The Symposium proved to be a unique historic event and these Proceedings, which contain the final versions of the papers presented at Baden, are an authentic reference document.

Fault Tolerant Computer Architecture

Fault Tolerant Computer Architecture PDF Author: Daniel Sorin
Publisher: Morgan & Claypool Publishers
ISBN: 1598299549
Category : Technology & Engineering
Languages : en
Pages : 116

Get Book Here

Book Description
For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art - over approximately the past 10 years - in academia and industry. Table of Contents: Introduction / Error Detection / Error Recovery / Diagnosis / Self-Repair / The Future

Hardware and Software Architectures for Fault Tolerance

Hardware and Software Architectures for Fault Tolerance PDF Author: Michel Banatre
Publisher: Springer Science & Business Media
ISBN: 9783540577676
Category : Computers
Languages : en
Pages : 332

Get Book Here

Book Description
Fault tolerance has been an active research area for many years. This volume presents papers from a workshop held in 1993 where a small number of key researchers and practitioners in the area met to discuss the experiences of industrial practitioners, to provide a perspective on the state of the art of fault tolerance research, to determine whether the subject is becoming mature, and to learn from the experiences so far in order to identify what might be important research topics for the coming years. The workshop provided a more intimate environment for discussions and presentations than usual at conferences. The papers in the volume were presented at the workshop, then updated and revised to reflect what was learned at the workshop.

Fault-tolerant Computing Systems

Fault-tolerant Computing Systems PDF Author: Fevzi Belli
Publisher:
ISBN:
Category : Fault-tolerant computing
Languages : de
Pages : 412

Get Book Here

Book Description


Fault-tolerant Computing

Fault-tolerant Computing PDF Author: Dhiraj K. Pradhan
Publisher: Prentice Hall
ISBN:
Category : Computer software
Languages : en
Pages : 312

Get Book Here

Book Description
Fault-tolerant computing has evolved into a broad discipline, one that encompasses all aspects of reliable computer design. Diverse areas of fault-tolerant study range from failure mechanisms in integrated circuits to the design of robust software. Fault-tolerant computing is driven by a number of key factors, including ultra-high reliability, reduced life-cycle costs, and long-life applications. This book is intended to be both introductory and suitable for advanced-level graduates. Chapters can be selected in various combinations to provide courses with different orientations.

Fault-tolerant Computing

Fault-tolerant Computing PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 145

Get Book Here

Book Description


Fault Tolerance, Principles and Practice

Fault Tolerance, Principles and Practice PDF Author: P. A. Lee
Publisher: Springer
ISBN:
Category : Computers
Languages : en
Pages : 344

Get Book Here

Book Description


University of Michigan Official Publication

University of Michigan Official Publication PDF Author: University of Michigan
Publisher: UM Libraries
ISBN:
Category : Education, Higher
Languages : en
Pages : 212

Get Book Here

Book Description
Each number is the catalogue of a specific school or college of the University.