I am trying to run replica exchange ABF simulations in a machine with 32
cores and 2 Tesla K40 cards. NAMD_2.12, compiled from source is what I am

>From this earlier thread,
I find out that using "twoAwayX" or "idlepoll" might help the GPUs to work
but somehow in my situation it's not helping the GPUs to work ("twoAwayX"
is speeding up the jobs though). The 'idlepoll' switch generally works fine
for Cuda build NAMD versions for non-replica jobs. From the aforesaid
thread, I get that running 4 replicas in 32 CPUs and 2 GPUs may not provide
a big boost to my simulations but I just want to check whether it works or

I am running command for the job:
mpirun -np 32 /home/sgd/program/NAMD_2.12_Source/Linux-x86_64-g++/namd2
+idlepoll +replicas 4 $inputfile +stdout log/job0.%d.log

My understanding is not helping me much, so any advice will be helpful.

