Re: REMD on GPU cluster

From: Michael Feig (feig_at_msu.edu)
Date: Tue Apr 14 2015 - 08:00:16 CDT

Thanks. This makes sense. But if there are two GPUs on one node how can I run two replicas each using one of them?

It sounds based on your post that this may not work.

-----
Michael Feig
feig_at_msu.edu

Norman Geist <norman.geist_at_uni-greifswald.de> wrote:

><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} span.E-MailFormatvorlage17 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:windowtext;} span.E-MailFormatvorlage18 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.Section1 {page:Section1;} -->
>
>Hey,
>

>
>basically you need to know that mpirun will start the processes round robin until it reached the number  of processes given. If it reaches the bottom of the machinefile and still has processes to start, it starts from the top of the file again.
>

>
>Away from that, the total number of cores must be a multiple of the number of replicas and also the number of nodes should be devisible by number of replicas. Processes started after each other belong to the same replica.
>

>
>Now away from that you need to see each replica with its processes as a individual simulation. So if you do not specify the +devices, it will use all usable devices on the nodes it runs, and automatically do the core<->gpu assignment.
>

>
>Norman Geist.
>

>
>From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf Of Michael Feig
>Sent: Tuesday, April 14, 2015 12:09 AM
>To: namd-l_at_ks.uiuc.edu
>Subject: namd-l: REMD on GPU cluster
>

>
>Basically my question is how to run NAMD with REMD across multiple nodes each with 1 or 2 GPUs.
>

>
>I have seen previous posts but I am unsure about how exactly to go about it.
>

>
>More specifically, for optimal performance it seems that I would want each replica use 1 GPU plus maybe 10 CPU cores. If I then wanted to run 10 replicas, would I simply request 10 GPUs and 100 cores (or 10 GPU nodes exclusively) and then use mpirun? How will NAMD figure out which cores ‘belong’ to which GPU or do I not need to worry about it? Does anybody have experience with REMD on such a cluster with a recent NAMD version (2.10 or newer)?
>

>
>Thanks.
>

>
>-------------------------------------------------------
>
>Michael Feig, Ph.D.
>
>Professor
>
>Biochemistry & Molecular Biology; Chemistry
>
>Michigan State University
>
>feig_at_msu.edu
>

>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:47 CST