Re: Enabling GPU for Replica exchange umbrella sampling

From: Abhishek TYAGI (atyagiaa_at_connect.ust.hk)
Date: Mon Jan 25 2016 - 01:49:23 CST

Dear Norman,

Thankyou for the suggestion, I had contacted Jim Phillip, he helped me to solve this problem by providing a binary for the use. The problem here is that I am not able to combine GPU with CPU, and I am not able to use CUDA enabled for REUS build, therefore, I am using NAMD_2.11-netlrts version, as their was a problem with charmm building in the server that I am working on. For the people who facing similar problem I am providing the solution which I got as follows: (the binaries were attached):

******************************************************************************************************************

From: Jim Phillips <jim_at_ks.uiuc.edu>
Sent: Sunday, January 24, 2016 5:21 AM
To: Abhishek TYAGI
Cc: charm_at_cs.illinois.edu
Subject: Re: [ppl] [charm] Charm install Error

You can use this binary:
http://www.ks.uiuc.edu/~jim/tmp/NAMD_2.11_Linux-x86_64-netlrts-CUDA.tar.gz

You will need to use the new "+devicesperreplica" option so that each
replica only uses one device and they do not all use the same device.

I suggest the following options (assuming you want 16 replicas):

./charmrun ++local +p16 ./namd2 +devicesperreplica 1 +replicas 16 +idlepoll +pemap 0-15

Jim

********************************************************************************************************************

Thanks and regards

Abhi

Abhishek Tyagi

PhD Student

Chemical and Biomolecular Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________
From: Norman Geist <norman.geist_at_uni-greifswald.de>
Sent: Monday, January 25, 2016 3:06 PM
To: Abhishek TYAGI
Cc: namd-l_at_ks.uiuc.edu
Subject: AW: Enabling GPU for Replica exchange umbrella sampling

Ok, but the error you posted

Fatal error on Partition 0 PE 0> REPLICA 0 FATAL ERROR: Unknown command-line option +devices

just tells you didn’t use a cuda enabled namd.

So what error or problem comes up, when using the proper binary?

Norman Geist

Von: Abhishek TYAGI [mailto:atyagiaa_at_connect.ust.hk]
Gesendet: Samstag, 23. Januar 2016 20:05
An: Norman Geist <norman.geist_at_uni-greifswald.de>
Cc: namd-l_at_ks.uiuc.edu
Betreff: Re: Enabling GPU for Replica exchange umbrella sampling

Hi Norman,

I had tried using CUDA enabled NAMD on gpu cluster. I can run normal simulations, however when I am trying to use this version for Replica exchange umbrella sampling, it got failed. I had even tried to install source code on the cluster and observed the error. Previously I had discussed this issue on mailing list and comeup with the recent version of NAMD that I am using now.

The cluster node is 4 GPU k20 2 CPU 32 core.

I tried using other versions, but unable to execute the command.

Thanks

Abi

Abhishek Tyagi

PhD Student

Chemical and Biomolecular Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________

From: Norman Geist <norman.geist_at_uni-greifswald.de<mailto:norman.geist_at_uni-greifswald.de>>
Sent: Friday, January 22, 2016 5:09 PM
To: Abhishek TYAGI
Cc: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>
Subject: AW: Enabling GPU for Replica exchange umbrella sampling

BTW. you need to use a CUDA enabled built of NAMD.

Norman Geist

Von: Abhishek TYAGI [mailto:atyagiaa_at_connect.ust.hk]
Gesendet: Freitag, 22. Januar 2016 05:03
An: Namd Mailing List <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Cc: norman.geist_at_uni-greifswald.de<mailto:norman.geist_at_uni-greifswald.de>; aliigleed16_at_gmail.com<mailto:aliigleed16_at_gmail.com>
Betreff: Re: Enabling GPU for Replica exchange umbrella sampling

Hi,

I am working on the tutorial "One-dimensional replica-exchange umbrella sampling".

I tried to follow the following script mentioned http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2014-2015/2522.html , still not able to combine GPU with CPU on cluster.

export cores=16
export coresPerReplica=1
export replicas=`echo "$cores / $coresPerReplica" | bc`
for (( i = 0; i < $replicas; i++ )); do mkdir -p output/$i; done

Any help appreciated.

Thanks

Abhi

Abhishek Tyagi

PhD Student

Chemical and Biomolecular Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________

From: owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu> <owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu>> on behalf of Abhishek TYAGI <atyagiaa_at_connect.ust.hk<mailto:atyagiaa_at_connect.ust.hk>>
Sent: Thursday, January 21, 2016 11:12 AM
To: Namd Mailing List
Subject: namd-l: Enabling GPU for REUS

Hi,

I am using REUS for the system of 42 k atoms. I have a problem in increasing benchmark time for the MD. I have cluster resource in which I have access to 1 node (4 GPU (K20) and 32 cores CPU), therefore, I can run REUS for 16 windows (2 cores 1 window).

I had checked the previously discussed link http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2014-2015/2522.html, it is helpful to some extent but I am not using mpi, the command I am following is:

NAMD_2.11_Linux-x86_64-netlrts/charmrun NAMD_2.11_Linux-x86_64-netlrts/namd2 ++ppn 2 +p32 +replicas 16 job0.conf +stdout output/%d/job0.%d.log

After executing above command the REUS implicated on 32 cores, not on GPU. The system right now on 32 core CPU. The benchmark time right now is 10 day/ns.

Than, I had modified the command and added +devices 0,1,2,3 :

NAMD_2.11_Linux-x86_64-netlrts/charmrun NAMD_2.11_Linux-x86_64-netlrts/namd2 +devices 0,1,2,3 ++ppn 2 +p32 +replicas 16 job0.conf +stdout output/%d/job0.%d.log

It produces error :

Charmrun> scalable start enabled.
Charmrun> started all node programs in 5.236 seconds.
Charm++> Running in non-SMP mode: numPes 32
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Charm++> TORUS A SIZE 32 USING 0 1 2 3 4 5 6 7 8 9...
Charm++> TORUS B SIZE 1 USING 0
Charm++> TORUS C SIZE 1 USING 0
Charm++> TORUS MINIMAL MESH SIZE IS 32 BY 1 BY 1
Redirecting stdout to files output/0/job0.0.log through 15
------- Partition 0 Processor 0 Exiting: Called CmiAbort ------
Reason: REPLICA 0 FATAL ERROR: Unknown command-line option +devices

Fatal error on Partition 0 PE 0> REPLICA 0 FATAL ERROR: Unknown command-line option +devices

I want to utilize the GPU with CPU, however I am not able to do it. Can anyone suggest to solve this problem.

Thanks in advance
Abhi

Abhishek Tyagi

PhD Student

Chemical and Biomolecular Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:46 CST