From: Hyun (biophysics1_at_gmail.com)
Date: Thu Mar 16 2017 - 10:10:19 CDT
Dear NAMD users and developers.
1. I compiled the NAMD of gpu version at ACCRE cluster (
http://www.accre.vanderbilt.edu/ )
Build and test the Charm++/Converse library
cd charm-6.7.1
./build charm++ verbs-linux-x86_64 gcc smp --with-production
Set up build directory and compile:
./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc
It looks okay, and there was no error message.
I wrote batch (slurm) script to test NAMD on gpu nodes, but I got error
messages.
I am not sure how to generate the node file, but I tried it.
Could you check it and give some comments ?
Thanks
Hyun.
2. batch slurm script
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks-per-node=2 #2 cpu processes per node
#SBATCH --gres=gpu:2 #2 gpu processes per node
#SBATCH --mem=4G # 4 GB RAM per node
#SBATCH --time=0-01:00:00 #
#SBATCH --output=test.out
#SBATCH --job-name="gpu-t"
jobname=test
echo "SLURM_JOBID: " $SLURM_JOBID
echo "SLURM_JOB_NODELIST: " $SLURM_JOB_NODELIST
# generate NAMD nodelist
echo group main > nodelist.$SLURM_JOBID
numcore=4
for ((a=1; a <= numcore ; a++))
do
echo "host $SLURM_JOB_NODELIST" >> nodelist.$SLURM_JOBID
done
NAMD_DIR=/home/****/NAMD/NAMD_2.12_Source/Linux-x86_64-g++
$NAMD_DIR/charmrun $NAMD_DIR/namd2 ++remote-shell ssh ++nodelist
nodelist.$SLURM_JOBID +p 4 ${jobname}.conf > ${jobname}.log
3. This batch file generated a nodelist as follows.
group main
host vmp1247
host vmp1247
host vmp1247
host vmp1247
4. But I got an error message as follows.
SLURM_JOBID: 13411368
SLURM_JOB_NODELIST: vmp1247
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 1 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 2 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 3 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Too many reconnection attempts; bailing out
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:10 CST