From: Martin, Erik W (Erik.Martin_at_stjude.org)
Date: Thu Nov 15 2012 - 18:38:55 CST
So, I can't imagine this has come up all that often - if ever - but I work in a group where I use namd and some others use pmemd. A fellow and I recently started simultaneously attempting to run calculations on a machine with 4 tesla GPU's and the result was that our jobs crashed. At first it seemed that the start of a namd or pmemd job would crash the running job. Then both jobs crashed. Finally, the machine froze so that neither namd or pmemd jobs would start (regardless of if the other was running). His outputs gave no clues to the cause. My output from namd only told me that there was a timeout on one of the two gpu's I'd defined for the job. I should also say that we were both explicitly selecting the IDs of which GPUs we were using so as to not cause any conflict.
I know this is an odd case, but does anyone have any clue to the cause of this problem? or seen any conflict before when trying to simultaneously run namd with another CUDA accelerated program?
Thanks a lot,
Erik
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:16 CST