From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Apr 14 2015 - 11:01:49 CDT
You can’t without modifying the code. But you also don’t need to do that. It will be no big difference in performance if you have
2 replica sharing 2 GPUs = 1 gpu / replica
2 replica having exclusive GPUs = 1 gpu / replica
From: Michael Feig [mailto:feig_at_msu.edu]
Sent: Tuesday, April 14, 2015 3:00 PM
To: Norman Geist
Subject: Re: namd-l: REMD on GPU cluster
Thanks. This makes sense. But if there are two GPUs on one node how can I run two replicas each using one of them?
It sounds based on your post that this may not work.
Norman Geist <norman.geist_at_uni-greifswald.de> wrote:
basically you need to know that mpirun will start the processes round robin until it reached the number of processes given. If it reaches the bottom of the machinefile and still has processes to start, it starts from the top of the file again.
Away from that, the total number of cores must be a multiple of the number of replicas and also the number of nodes should be devisible by number of replicas. Processes started after each other belong to the same replica.
Now away from that you need to see each replica with its processes as a individual simulation. So if you do not specify the +devices, it will use all usable devices on the nodes it runs, and automatically do the core<->gpu assignment.
From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf Of Michael Feig
Sent: Tuesday, April 14, 2015 12:09 AM
Subject: namd-l: REMD on GPU cluster
Basically my question is how to run NAMD with REMD across multiple nodes each with 1 or 2 GPUs.
I have seen previous posts but I am unsure about how exactly to go about it.
More specifically, for optimal performance it seems that I would want each replica use 1 GPU plus maybe 10 CPU cores. If I then wanted to run 10 replicas, would I simply request 10 GPUs and 100 cores (or 10 GPU nodes exclusively) and then use mpirun? How will NAMD figure out which cores ‘belong’ to which GPU or do I not need to worry about it? Does anybody have experience with REMD on such a cluster with a recent NAMD version (2.10 or newer)?
Michael Feig, Ph.D.
Biochemistry & Molecular Biology; Chemistry
Michigan State University
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:47 CST