Author: Alexander Schöll
Publisher:
ISBN:
Category :
Languages : en
Pages :
Book Description
Efficient Fault Tolerance for Selected Scientific Computing Algorithms on Heterogeneous and Approximate Computer Architectures
Author: Alexander Schöll
Publisher:
ISBN:
Category :
Languages : en
Pages :
Book Description
Publisher:
ISBN:
Category :
Languages : en
Pages :
Book Description
Fault-Tolerance Techniques for High-Performance Computing
Author: Thomas Herault
Publisher: Springer
ISBN: 3319209434
Category : Computers
Languages : en
Pages : 325
Book Description
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Publisher: Springer
ISBN: 3319209434
Category : Computers
Languages : en
Pages : 325
Book Description
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
The Evolution of Fault-Tolerant Computing
Author: A. Avizienis
Publisher: Springer Science & Business Media
ISBN: 3709188717
Category : Computers
Languages : en
Pages : 467
Book Description
For the editors of this book, as well as for many other researchers in the area of fault-tolerant computing, Dr. William Caswell Carter is one of the key figures in the formation and development of this important field. We felt that the IFIP Working Group 10.4 at Baden, Austria, in June 1986, which coincided with an important step in Bill's career, was an appropriate occasion to honor Bill's contributions and achievements by organizing a one day "Symposium on the Evolution of Fault-Tolerant Computing" in the honor of William C. Carter. The Symposium, held on June 30, 1986, brought together a group of eminent scientists from all over the world to discuss the evolu tion, the state of the art, and the future perspectives of the field of fault-tolerant computing. Historic developments in academia and industry were presented by individuals who themselves have actively been involved in bringing them about. The Symposium proved to be a unique historic event and these Proceedings, which contain the final versions of the papers presented at Baden, are an authentic reference document.
Publisher: Springer Science & Business Media
ISBN: 3709188717
Category : Computers
Languages : en
Pages : 467
Book Description
For the editors of this book, as well as for many other researchers in the area of fault-tolerant computing, Dr. William Caswell Carter is one of the key figures in the formation and development of this important field. We felt that the IFIP Working Group 10.4 at Baden, Austria, in June 1986, which coincided with an important step in Bill's career, was an appropriate occasion to honor Bill's contributions and achievements by organizing a one day "Symposium on the Evolution of Fault-Tolerant Computing" in the honor of William C. Carter. The Symposium, held on June 30, 1986, brought together a group of eminent scientists from all over the world to discuss the evolu tion, the state of the art, and the future perspectives of the field of fault-tolerant computing. Historic developments in academia and industry were presented by individuals who themselves have actively been involved in bringing them about. The Symposium proved to be a unique historic event and these Proceedings, which contain the final versions of the papers presented at Baden, are an authentic reference document.
Fault Tolerant Computer Architecture
Author: Daniel Sorin
Publisher: Morgan & Claypool Publishers
ISBN: 1598299549
Category : Technology & Engineering
Languages : en
Pages : 116
Book Description
For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art - over approximately the past 10 years - in academia and industry. Table of Contents: Introduction / Error Detection / Error Recovery / Diagnosis / Self-Repair / The Future
Publisher: Morgan & Claypool Publishers
ISBN: 1598299549
Category : Technology & Engineering
Languages : en
Pages : 116
Book Description
For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art - over approximately the past 10 years - in academia and industry. Table of Contents: Introduction / Error Detection / Error Recovery / Diagnosis / Self-Repair / The Future
Hardware and Software Architectures for Fault Tolerance
Author: Michel Banatre
Publisher: Springer Science & Business Media
ISBN: 9783540577676
Category : Computers
Languages : en
Pages : 332
Book Description
Fault tolerance has been an active research area for many years. This volume presents papers from a workshop held in 1993 where a small number of key researchers and practitioners in the area met to discuss the experiences of industrial practitioners, to provide a perspective on the state of the art of fault tolerance research, to determine whether the subject is becoming mature, and to learn from the experiences so far in order to identify what might be important research topics for the coming years. The workshop provided a more intimate environment for discussions and presentations than usual at conferences. The papers in the volume were presented at the workshop, then updated and revised to reflect what was learned at the workshop.
Publisher: Springer Science & Business Media
ISBN: 9783540577676
Category : Computers
Languages : en
Pages : 332
Book Description
Fault tolerance has been an active research area for many years. This volume presents papers from a workshop held in 1993 where a small number of key researchers and practitioners in the area met to discuss the experiences of industrial practitioners, to provide a perspective on the state of the art of fault tolerance research, to determine whether the subject is becoming mature, and to learn from the experiences so far in order to identify what might be important research topics for the coming years. The workshop provided a more intimate environment for discussions and presentations than usual at conferences. The papers in the volume were presented at the workshop, then updated and revised to reflect what was learned at the workshop.
Fault-tolerant Computing Systems
Author: Fevzi Belli
Publisher:
ISBN:
Category : Fault-tolerant computing
Languages : de
Pages : 412
Book Description
Publisher:
ISBN:
Category : Fault-tolerant computing
Languages : de
Pages : 412
Book Description
Fault-tolerant Computing
Author: Dhiraj K. Pradhan
Publisher: Prentice Hall
ISBN:
Category : Computer software
Languages : en
Pages : 312
Book Description
Fault-tolerant computing has evolved into a broad discipline, one that encompasses all aspects of reliable computer design. Diverse areas of fault-tolerant study range from failure mechanisms in integrated circuits to the design of robust software. Fault-tolerant computing is driven by a number of key factors, including ultra-high reliability, reduced life-cycle costs, and long-life applications. This book is intended to be both introductory and suitable for advanced-level graduates. Chapters can be selected in various combinations to provide courses with different orientations.
Publisher: Prentice Hall
ISBN:
Category : Computer software
Languages : en
Pages : 312
Book Description
Fault-tolerant computing has evolved into a broad discipline, one that encompasses all aspects of reliable computer design. Diverse areas of fault-tolerant study range from failure mechanisms in integrated circuits to the design of robust software. Fault-tolerant computing is driven by a number of key factors, including ultra-high reliability, reduced life-cycle costs, and long-life applications. This book is intended to be both introductory and suitable for advanced-level graduates. Chapters can be selected in various combinations to provide courses with different orientations.
Fault-tolerant Computing
Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 145
Book Description
Publisher:
ISBN:
Category :
Languages : en
Pages : 145
Book Description
Fault Tolerance, Principles and Practice
Author: P. A. Lee
Publisher: Springer
ISBN:
Category : Computers
Languages : en
Pages : 344
Book Description
Publisher: Springer
ISBN:
Category : Computers
Languages : en
Pages : 344
Book Description
University of Michigan Official Publication
Author: University of Michigan
Publisher: UM Libraries
ISBN:
Category : Education, Higher
Languages : en
Pages : 212
Book Description
Each number is the catalogue of a specific school or college of the University.
Publisher: UM Libraries
ISBN:
Category : Education, Higher
Languages : en
Pages : 212
Book Description
Each number is the catalogue of a specific school or college of the University.