Re: replica exchange and GPU acceleration

From: Mitchell Gleed (aliigleed16_at_gmail.com)
Date: Mon Jul 13 2015 - 01:32:14 CDT

Next message: Anjela Manandhar: "UNABLE TO FIND ANGLE PARAMETERS FOR HT HT OT"
Previous message: Norman Geist: "RE: replica exchange and GPU acceleration"
Maybe in reply to: Mitchell Gleed: "Re: replica exchange and GPU acceleration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

While I've known about their utility, I neglected to implement the twoAwayX
and +idlepoll options for the test case here. Now that I have implemented
them, the 16core/16replica/4gpu case runs faster than without GPU's, but
only about 15% faster (1.565 days/ns compared to 1.871 days/ns). While I
hoped the REMD acceleration from the GPU's would be a little higher, I
think I'm probably limited by the smaller system size for the typical gains
from GPU's. I'm just pleased this is working and that the 1replica:1gpu
case works so well.

Thanks again, Norman

A quick summary for anyone who finds this thread in the future:
- struggled to get GPU's to work with the namd REMD example cases, it's
possible that twoAwayX might help
- found that the GPU's *do *work for a larger REMD system with ~30k atoms
- found that GPU acceleration loses value as the number of replicas per gpu
increases (worth testing to ensure if its worth it)
- adding idlepoll (command line) and twoAwayX (conf file) helped
performance in GPU-accelerated REMD simulations

Here is the bash syntax I used to successfully launch the REMD simulations
with GPU's:
export cores=16
export coresPerReplica=1
export replicas=`echo "$cores / $coresPerReplica" | bc`
for (( i = 0; i < $replicas; i++ )); do mkdir -p output/$i; done
mpirun -np $cores namd2 +replicas $replicas +idlepoll job0.conf +stdout
output/%d/job0_%d.log

On Sun, Jul 12, 2015 at 10:51 PM, Norman Geist <
norman.geist_at_uni-greifswald.de> wrote:

> Fine you got it running so far.
>
>
>
> The error about “requires at least one patch per process” simply means
> that your system was too small for that amount of computing resources. This
> can be overcome by using “twoawayx yes” in config file to artificially
> increase the number of patches (box slices) for parallelization.
>
>
>
> For small systems you should always check the impact of “twoawayx yes”.
> Usually it brings a two-fold speedup on gpus.
>
>
>
> This might also improve the desired 16core+16replica+4gpu case.
>
>
>
> Also try +idlepoll to the namd2 binary which can again cause a two-fold
> speedup, but never harms.
>
>
>
> Norman Geist.
>
>
>
> *From:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *On
> Behalf Of *Mitchell Gleed
> *Sent:* Monday, July 13, 2015 5:37 AM
> *To:* NAMD list; Norman Geist
> *Subject:* Re: namd-l: replica exchange and GPU acceleration
>
>
>
> Sorry for the late reply, I've been utilizing the university
> supercomputer's GPU nodes for other simulations the past week and couldn't
> test this out until those simulations finished up.
>
>
> Since the GPU nodes have 24 cores, I followed your suggestion to do 4
> replicas with 8 processes since I can't do 16 replicas with 32 processes.
> With this setup, I started getting the error "CUDA-enabled NAMD requires at
> least one patch per thread" for the namd/lib/replica/example test case.
>
> I thought maybe the error meant I could only use CUDA-enabled NAMD with a
> PME system, so I decided to make a test case for a PME system, adapting the
> lib/replica/umbrella-2d case. I'm now able to get the GPU's to accelerate
> the replica exchange simulations, even 1 replica:1 gpu:1 process. However,
> I've found the GPU's only help if there's one GPU per replica, and when
> #replicas > #gpu's, simulations run slower with the GPU's than without. I
> assume that might just be the way things will have to be, but if there's
> anything else I can try in order to get my ideal case of
> 16replicas:16procs:4gpu to benefit from the GPU's, that'd be great.
>
> Here are the benchmark results for the ~30k atom system I tested, in case
> anyone's interested:
> 4replicas 4procs 0gpu 1.61468 days/ns
>
> 4replicas 4procs 4gpu 0.669901 days/ns
>
> 4replicas 8procs 0gpu 1.11726 days/ns
>
> 4replicas 8procs 4gpu 0.445677 days/ns
>
> 4replicas 16procs 0gpu 1.03864 days/ns
>
> 16replicas 16procs 0gpu 1.87094 days/ns
>
> 16replicas 16procs 4gpu 2.52038 days/ns
>
> Thanks for your help, Norman.
>

Next message: Anjela Manandhar: "UNABLE TO FIND ANGLE PARAMETERS FOR HT HT OT"
Previous message: Norman Geist: "RE: replica exchange and GPU acceleration"
Maybe in reply to: Mitchell Gleed: "Re: replica exchange and GPU acceleration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:13 CST