From: Matthew Ralph Adendorff (mraden_at_mit.edu)
Date: Tue May 13 2014 - 09:25:44 CDT
We have recently deployed a new Bright Cluster Manager HPC server (RHEL6) that is running on an InfiniBand fabric (Mellanox) and which has dual NVIDIA K20s per node. I can successfully run NAMD2.9 CUDA, the Night-Build CUDA version (both on single nodes with one or two GPUs) and an MPI build. I have trouble when it comes to launching the SMP-Ibverbs-CUDA build, however, and receive an error that the CUDA runtime does not match the driver. This error never occurs when launched with the same environment settings in the other two CUDA versions.
I am wondering if this is an issue with the parameters sent to the SLURM scheduler and it's deployment of the correct resources?Perhaps someone might have advice or has had some success in such a system? Would it be better to use Torque for this task perhaps? Could this be a conflict with the libcudart library?
Any advice would be greatly appreciated. Thank you for such an excellent support network.
Laboratory for Computational Biology and Biophysics
Department of Biological Engineering
Massachusetts Institute of Technology
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:20:47 CST