Re: slow down when running 2 simulations on 1 node

From: David Hardy (dhardy_at_ks.uiuc.edu)
Date: Mon Dec 16 2019 - 12:36:12 CST

Next message: Geordano Palacios: "Cuda"
Previous message: Geordano Palacios: "Gradual minimization"
In reply to: Gerald Keller: "Re: slow down when running 2 simulations on 1 node"
Next in thread: Geordano Palacios: "Cuda"
Reply: Geordano Palacios: "Cuda"
Reply: Gerald Keller: "Re: slow down when running 2 simulations on 1 node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Gerald,

I think your slow down might be due to accidentally using both GPUs for each process.

By default, NAMD will use all devices that it finds. You should add to the first invocation of NAMD "+devices 0" to restrict to using only GPU 0 and to the second "+devices 1" to restrict to using only GPU 1.

NAMD is already CPU-intensive enough on each thread that it generally does not benefit from hyperthreading.

Best regards,
Dave

--
David J. Hardy, Ph.D.
Beckman Institute
University of Illinois at Urbana-Champaign
405 N. Mathews Ave., Urbana, IL 61801
dhardy_at_ks.uiuc.edu, http://www.ks.uiuc.edu/~dhardy/
> On Dec 14, 2019, at 9:51 AM, Gerald Keller <gerald.keller_at_uni-wuerzburg.de> wrote:
> 
> Thank you all for your suggestions! 
> 
> I tried out to set cpu affinity but the simulation speed still slows down when starting the second replica. 
> 
> On a node with Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz (1 socket, 14 cores, 28 with hyperthreading) i tried, 
> 
> For the first replica on GPU 0 I used: namd2 +setcpuaffinity +pemap 0-11 +p 12 +idlepoll
> The second on GPU 1: namd2 +setcpuaffinity +pemap 11-23 +p 12 +idlepoll
> 
> also tried: 
> 
> 1st repilca: namd2 +setcpuaffinity +pemap 0-11:2 +p 6 +idlepoll
> 2nd repilca: namd2 +setpcuaffinity +pemap 11-23:2 +p 6 + idlepoll
> 
> Giacomo mentioned that hyperthreading has to be disabled. I thaught namd would support hyperthreading? 
> 
> 
> Best regards
> Gerald
> 
> 
> >>> Giacomo Fiorin <giacomo.fiorin_at_gmail.com <mailto:giacomo.fiorin_at_gmail.com>> 12.12.19 20.27 Uhr >>>
> Hello Gerald, I would go with Victor's and Julio's suggestion, but also try making sure that HyperThreading is disabled i.e. there are 40 CPU physical cores  and not 20.  In /proc/cpuinfo look for the keyword "ht" among the CPU features.
> 
> It is likewise good to keep in mind that unless a program runs entirely on the GPU, transferring data between the GPU and the CPU goes via circuitry that is most of the time shared among the devices on one motherboard.
> 
> Giacomo
> 
> On Thu, Dec 12, 2019 at 2:14 PM Julio Maia <jmaia_at_ks.uiuc.edu <mailto:jmaia_at_ks.uiuc.edu>> wrote:
> Hi, 
> If you’re not setting the correct affinities, PEs from different replicas might compete for the same cores in your machine.
> Please try to set CPU affinities for PEs for each replica and try again. You can check how it’s done here: https://www.ks.uiuc.edu/Research/namd/2.13/ug/node105.html <https://www.ks.uiuc.edu/Research/namd/2.13/ug/node105.html>
> Thanks,
> 
> 
> On Dec 12, 2019, at 2:09 AM, Gerald Keller <gerald.keller_at_uni-wuerzburg.de <mailto:gerald.keller_at_uni-wuerzburg.de>> wrote:
> 
> Hi everyone, 
> 
> in our working group we compute on our own GPU nodes, with no queue system and do not compute on multiple nodes. 
> When we calculate two replicas of plain MD runs on 1 node with in total 2 GPUs and 40 CPUs we recognized that the simulation speed slows down when starting the second replica. 
> 
> 1x NAMD on 1 node using 1 GPU and 18 CPUs:
> 
> Info: Benchmark time: 18 CPUs 0.00742875 s/step 
> Info: Benchmark time: 18 CPUs 0.0073947 s/step 
> Info: Benchmark time: 18 CPUs 0.00747593 s/step
> Info: Benchmark time: 18 CPUs 0.00752931 s/step
> Info: Benchmark time: 18 CPUs 0.00744549 s/step
> Info: Benchmark time: 18 CPUs 0.00746218 s/step
> 
> TIMING: 500  CPU: 3.86542, 0.0073741/step  Wall: 3.90971, 0.0074047/step
> TIMING: 980  CPU: 7.43293, 0.00730715/step  Wall: 7.49914, 0.00738945/step
> TIMING: 1000  CPU: 7.58503, 0.007605/step  Wall: 7.65193, 0.0076393/step
> TIMING: 1500  CPU: 11.2973, 0.0073617/step  Wall: 11.3969, 0.00763561/step
> TIMING: 2000  CPU: 15.0195, 0.00745355/step  Wall: 15.1411, 0.0075375/step
> 
> 
> 2x NAMD on 1 node 1 GPU and 18 CPUs for each replica:
> 
> Info: Benchmark time: 18 CPUs 0.0115988 s/step
> Info: Benchmark time: 18 CPUs 0.0116316 s/step
> Info: Benchmark time: 18 CPUs 0.0118586 s/step
> Info: Benchmark time: 18 CPUs 0.0115375 s/step
> Info: Benchmark time: 18 CPUs 0.0114114 s/step
> Info: Benchmark time: 18 CPUs 0.0117798 s/step
> 
> TIMING: 500  CPU: 6.0915, 0.0113823/step  Wall: 6.18421, 0.0114815/step
> TIMING: 1000  CPU: 11.8594, 0.0126053/step  Wall: 12.0109, 0.0127244/step
> TIMING: 1500  CPU: 17.564, 0.0114935/step  Wall: 17.7579, 0.0116048/step
> TIMING: 2000  CPU: 23.3157, 0.0119276/step  Wall: 23.5628, 0.0119936/step
> 
> If we run 1x NAMD on 1 node using 1 GPU and 18 CPUs and start another simulation with amber on the other GPU, there is no influence on the namd simulation speed. 
> 
> Does anyone have an idea why this is happening and how to solve that problem? Because of limited resources, somtimes we have to run only one simulation per GPU. 
> 
> Thank you in advance for your suggestions!
> 
> Best regards
> Gerald
> 
> 
> 
> -- 
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Research collaborator, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU <https://urldefense.proofpoint.com/v2/url?u=http-3A__goo.gl_Q3TBQU&d=DwMFaQ&c=OCIEmEwdEq_aNlsP4fF3gFqSN-E3mlr2t9JcDdfOZag&r=jUfnSyKkfkyVRBIUzlG1GSGGZAZGcznwr8YliSSCjPc&m=l2Cwbk2f0k2qYMVJj3K4Xy91p3coumyOtDd_gRZeKdk&s=cOJ1vDHtAz_1fJPS-_lGYSP0M_0Ig4B8eOmwmUtuAP4&e=>
> https://github.com/giacomofiorin <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_giacomofiorin&d=DwMFaQ&c=OCIEmEwdEq_aNlsP4fF3gFqSN-E3mlr2t9JcDdfOZag&r=jUfnSyKkfkyVRBIUzlG1GSGGZAZGcznwr8YliSSCjPc&m=l2Cwbk2f0k2qYMVJj3K4Xy91p3coumyOtDd_gRZeKdk&s=p7Ls704FOJkeoxhyLWDuG1wdAaYoKb-VCZ5QBhgUizg&e=>

Next message: Geordano Palacios: "Cuda"
Previous message: Geordano Palacios: "Gradual minimization"
In reply to: Gerald Keller: "Re: slow down when running 2 simulations on 1 node"
Next in thread: Geordano Palacios: "Cuda"
Reply: Geordano Palacios: "Cuda"
Reply: Gerald Keller: "Re: slow down when running 2 simulations on 1 node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:21:04 CST