generating nodelist and error

From: Hyun (biophysics1_at_gmail.com)
Date: Thu Mar 16 2017 - 10:10:19 CDT

Dear NAMD users and developers.

1. I compiled the NAMD of gpu version at ACCRE cluster (
http://www.accre.vanderbilt.edu/ )

Build and test the Charm++/Converse library
  cd charm-6.7.1
  ./build charm++ verbs-linux-x86_64 gcc smp --with-production

Set up build directory and compile:
  ./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-smp-gcc

It looks okay, and there was no error message.

I wrote batch (slurm) script to test NAMD on gpu nodes, but I got error
messages.
I am not sure how to generate the node file, but I tried it.

Could you check it and give some comments ?

Thanks

Hyun.

2. batch slurm script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks-per-node=2 #2 cpu processes per node
#SBATCH --gres=gpu:2 #2 gpu processes per node
#SBATCH --mem=4G # 4 GB RAM per node
#SBATCH --time=0-01:00:00 #
#SBATCH --output=test.out
#SBATCH --job-name="gpu-t"

jobname=test

echo "SLURM_JOBID: " $SLURM_JOBID
echo "SLURM_JOB_NODELIST: " $SLURM_JOB_NODELIST

# generate NAMD nodelist
echo group main > nodelist.$SLURM_JOBID

numcore=4

for ((a=1; a <= numcore ; a++))
do
  echo "host $SLURM_JOB_NODELIST" >> nodelist.$SLURM_JOBID
done

NAMD_DIR=/home/****/NAMD/NAMD_2.12_Source/Linux-x86_64-g++

$NAMD_DIR/charmrun $NAMD_DIR/namd2 ++remote-shell ssh ++nodelist
nodelist.$SLURM_JOBID +p 4 ${jobname}.conf > ${jobname}.log

3. This batch file generated a nodelist as follows.

group main
host vmp1247
host vmp1247
host vmp1247
host vmp1247

4. But I got an error message as follows.

SLURM_JOBID: 13411368
SLURM_JOB_NODELIST: vmp1247
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 1 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 2 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Reconnection attempt 3 of 3
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Charmrun> Error 255 returned from remote shell (vmp1247:0)
Charmrun> Too many reconnection attempts; bailing out

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:09 CST