Close

Presentation

Fault Tolerance in Krylov Subspace Methods
DescriptionToday’s complex HPC systems are incredibly powerful yet equally likely to experience failures. The scientific applications on these HPC systems are mostly iterative in nature. Iterative solvers have some inherent fault tolerance, but they are still susceptible to errors. One subset of these iterative methods are the Krylov Subspace Methods. There has been limited research on the fault tolerance of these methods against soft errors. We know Preconditioned Conjugate Gradient (PCG) to be self-correcting in nature. But we don’t know much about other methods in the Krylov Subspace. Our goal is to study the error propagation caused by Sparse Matrix-Vector Multiplication (SpMV) operation in Lanczos Method, Bi-Conjugate Gradient (BiCG) Method and PCG Method. By using the results from the experiments and knowledge from previous works, we will generalize our findings for all the Krylov Subspace Methods.
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
TimeTuesday, 19 November 202412pm - 5pm EST
LocationB302-B305
Registration Categories
TP
XO/EX