Re: Replica exchange simulation with GPU Accelaration

From: Renfro, Michael (Renfro_at_tntech.edu)
Date: Thu Jan 25 2018 - 13:40:28 CST

I can’t speak for running replicas as such, but my usual way of running on a single node with GPUs is to use the multicore-CUDA NAMD build, and to run namd as:

  namd2 +setcpuaffinity +devices ${GPU_DEVICE_ORDINAL} +p${SLURM_NTASKS} ${INPUT} >& ${OUTPUT}

Where ${GPU_DEVICE_ORDINAL} is “0”, “1”, or “0,1” depending on which GPU I reserve; ${SLURM_NTASKS} is the number of cores needed, and ${INPUT} and ${OUTPUT} are the NAMD input file and the file to record standard output.

Use HECBioSym’s 3M atom benchmark model, an single K80 card (presented as 2 distinct GPUs) could keep 8 E5-2680v4 CPU cores busy. But 16 or 28 cores (the maxiumum on a single node of ours) was hardly any faster with 2 GPUs than 8 cores.

-- 
Mike Renfro  / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University
> On Jan 25, 2018, at 12:59 PM, Souvik Sinha <souvik.sinha893_at_gmail.com> wrote:
> 
> Thanks for your reply.
> I was wondering, why 'idlepoll' can't even call gpu to work despite the probability of a poor performance.
> 
> On 25 Jan 2018 19:53, "Giacomo Fiorin" <giacomo.fiorin_at_gmail.com> wrote:
> Hi Souvik, this seems connected to the compilation options.  Compiling with MPI + SMP + CUDA used to be very poor performance, although I haven't tried with the new CUDA kernels (2.12 and later).
> 
> Giacomo
> 
> On Thu, Jan 25, 2018 at 4:02 AM, Souvik Sinha <souvik.sinha893_at_gmail.com> wrote:
> NAMD Users,
> 
> I am trying to run replica exchange ABF simulations in a machine with 32 cores and 2 Tesla K40 cards. NAMD_2.12, compiled from source is what I am using. 
> 
> From this earlier thread, http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2014-2015/2490.html, I find out that using "twoAwayX" or "idlepoll" might help the GPUs to work but somehow in my situation it's not helping the GPUs to work ("twoAwayX" is speeding up the jobs though). The 'idlepoll' switch generally works fine for Cuda build NAMD versions for non-replica jobs.  From the aforesaid thread, I get that running 4 replicas in 32 CPUs and 2 GPUs may not provide a big boost to my simulations but I just want to check whether it works or not?
> 
> I am running command for the job:
> mpirun -np 32 /home/sgd/program/NAMD_2.12_Source/Linux-x86_64-g++/namd2 +idlepoll  +replicas 4  $inputfile +stdout log/job0.%d.log
> 
> My understanding is not helping me much, so any advice will be helpful.
> 
> Thank you 
> 
> -- 
> Souvik Sinha
> Research Fellow
> Bioinformatics Centre (SGD LAB)
> Bose Institute
> 
> Contact: 033 25693275
> 
> 
> 
> -- 
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:48 CST