running namd in parallel

From: Jorgen Simonsen (jorgen589_at_gmail.com)
Date: Sat Oct 03 2009 - 16:18:57 CDT

Hi all,
I am trying to run namd in parallel but it is not working. I have downloaded
the NAMD_2.7b1_Linux-x86_64-TCP and using the binary namd2 and charmrun from
here. If I run a single job no problems

namd2 conf.conf > log.log

it runs and produces the expected results. If I on the otherhand submit the
job asking for 8 cpus

#!/bin/sh
### Note: No commands may be executed until after the #PBS lines
### Job name
#PBS -N test
### Output files
#PBS -e test.err
#PBS -o test.log
### Queue name (small, medium, long, verylong)
#PBS -q small
### Number of nodes
#PBS -l nodes=2:ppn=4
# Define number of processors
NPROCS=`wc -l < $PBS_NODEFILE`
echo This job has allocated $NPROCS nodes

# Go tho the directory from where the job was submitted (initial directory
is $HOME)
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
./../../Programs/NAMD/charmrun ++local ../../../Programs/NAMD/namd2
+p$NPROCS min.conf > data.log

it starts up 16 threads on one processor which is of course a waste. If I
remove the ++local and add ++verbose
Charmrun> charmrun started...
Charmrun> using /home/user/.nodelist as nodesfile
Charmrun> remote shell (localhost:0) started
Charmrun> remote shell (localhost:1) started
Charmrun> remote shell (localhost:2) started
Charmrun> remote shell (localhost:3) started
Charmrun> node programs all started
connect to address 127.0.0.1: Connection refused
connect to address 127.0.0.1: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 127.0.0.1: Connection refused
connect to address 127.0.0.1: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 127.0.0.1: Connection refused
connect to address 127.0.0.1: Connection refused
trying normal rsh (/usr/bin/rsh)
connect to address 127.0.0.1: Connection refused
connect to address 127.0.0.1: Connection refused
trying normal rsh (/usr/bin/rsh)
localhost.localdomain: Connection refused
localhost.localdomain: Connection refused
localhost.localdomain: Connection refused
localhost.localdomain: Connection refused
Charmrun> Error 1 returned from rsh (localhost:0)

What is wrong and how can I fix this. Thanks in advance

Best

Jorgen

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:20 CST