RE: REMD on GPU cluster

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Apr 14 2015 - 02:10:03 CDT

Hey,

 

basically you need to know that mpirun will start the processes round robin
until it reached the number of processes given. If it reaches the bottom of
the machinefile and still has processes to start, it starts from the top of
the file again.

 

Away from that, the total number of cores must be a multiple of the number
of replicas and also the number of nodes should be devisible by number of
replicas. Processes started after each other belong to the same replica.

 

Now away from that you need to see each replica with its processes as a
individual simulation. So if you do not specify the +devices, it will use
all usable devices on the nodes it runs, and automatically do the core<->gpu
assignment.

 

Norman Geist.

 

From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf
Of Michael Feig
Sent: Tuesday, April 14, 2015 12:09 AM
To: namd-l_at_ks.uiuc.edu
Subject: namd-l: REMD on GPU cluster

 

Basically my question is how to run NAMD with REMD across multiple nodes
each with 1 or 2 GPUs.

 

I have seen previous posts but I am unsure about how exactly to go about it.

 

More specifically, for optimal performance it seems that I would want each
replica use 1 GPU plus maybe 10 CPU cores. If I then wanted to run 10
replicas, would I simply request 10 GPUs and 100 cores (or 10 GPU nodes
exclusively) and then use mpirun? How will NAMD figure out which cores
'belong' to which GPU or do I not need to worry about it? Does anybody have
experience with REMD on such a cluster with a recent NAMD version (2.10 or
newer)?

 

Thanks.

 

-------------------------------------------------------

Michael Feig, Ph.D.

Professor

Biochemistry & Molecular Biology; Chemistry

Michigan State University

feig_at_msu.edu

 

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:03 CST