RE: REMD on GPU cluster

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Apr 14 2015 - 11:01:49 CDT

You can’t without modifying the code. But you also don’t need to do that. It will be no big difference in performance if you have

 

2 replica sharing 2 GPUs = 1 gpu / replica

or

 2 replica having exclusive GPUs = 1 gpu / replica

 

Norman Geist.

 

From: Michael Feig [mailto:feig_at_msu.edu]
Sent: Tuesday, April 14, 2015 3:00 PM
To: Norman Geist
Cc: namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: REMD on GPU cluster

 

Thanks. This makes sense. But if there are two GPUs on one node how can I run two replicas each using one of them?

 

It sounds based on your post that this may not work.

 

-----

Michael Feig

feig_at_msu.edu

Norman Geist <norman.geist_at_uni-greifswald.de> wrote:

Hey,

 

basically you need to know that mpirun will start the processes round robin until it reached the number of processes given. If it reaches the bottom of the machinefile and still has processes to start, it starts from the top of the file again.

 

Away from that, the total number of cores must be a multiple of the number of replicas and also the number of nodes should be devisible by number of replicas. Processes started after each other belong to the same replica.

 

Now away from that you need to see each replica with its processes as a individual simulation. So if you do not specify the +devices, it will use all usable devices on the nodes it runs, and automatically do the core<->gpu assignment.

 

Norman Geist.

 

From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf Of Michael Feig
Sent: Tuesday, April 14, 2015 12:09 AM
To: namd-l_at_ks.uiuc.edu
Subject: namd-l: REMD on GPU cluster

 

Basically my question is how to run NAMD with REMD across multiple nodes each with 1 or 2 GPUs.

 

I have seen previous posts but I am unsure about how exactly to go about it.

 

More specifically, for optimal performance it seems that I would want each replica use 1 GPU plus maybe 10 CPU cores. If I then wanted to run 10 replicas, would I simply request 10 GPUs and 100 cores (or 10 GPU nodes exclusively) and then use mpirun? How will NAMD figure out which cores ‘belong’ to which GPU or do I not need to worry about it? Does anybody have experience with REMD on such a cluster with a recent NAMD version (2.10 or newer)?

 

Thanks.

 

-------------------------------------------------------

Michael Feig, Ph.D.

Professor

Biochemistry & Molecular Biology; Chemistry

Michigan State University

feig_at_msu.edu

 

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:47 CST