Dooley, Isaac; Mei, Chao; Kale, Laxmikant
NOISEMINER: An algorithm for scalable automatic computational noise and software interference detection
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 582-589, 2008

This paper describes a new scalable stream mining algorithm called NOISEMINER that analyzes parallel application traces to detect computational noise, operating system interference, software interference, or other irregularities in a parallel application's performance. The algorithm detects these occurrences of noise during real application runs, whereas standard techniques for detecting noise use carefully crafted test programs to detect the problems. This paper concludes by showing the output of NOISEMINER for a real-world case in which 6 ms delays, caused by a bug in an MPI implementation, significantly limited the performance of a molecular dynamics code on a new supercomputer.

Find full text with Google Scholar.