Ib version and nodelist

From: Neelanjana Sengupta (senguptan_at_gmail.com)
Date: Mon Aug 22 2011 - 00:36:09 CDT

Dear NAMD experts,

We are attempting to run the Infiniband version of NAMD2.8 (
Linux-x86_64-ibverbs<http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1159>)
on an Infiniband cluster in which each node contains 12 processors. The
compute nodes are sequentially named as *cn001* through *cn056* (as seen
with the cmd *pbsnodes -a*).
Our .nodelist file looks like this:

*group main
host cn001
host cn002
..
..
host cn054
host cn055
host cn056*

We tried running jobs with this command in the submit script:
*/soft/NAMD_2.8_Linux-x86_64-ibverbs/charmrun ++ppn 12 ++p 12 ++remote-shell
/soft/NAMD_2.8_Linux-x86_64-ibverbs/namd2 job.inp*

However, we get errors like this:

*Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-ibverbs-iccstatic
Info: Built Sat May 28 11:31:19 CDT 2011 by jim on dakar.ks.uiuc.edu
Charm++: standalone mode (not using charmrun)
Warning> RandomizCharm++: standalone mode (not using charmrun)
Warning> Randomization of stack pointer is turned on in kernel, thread
migration may not work! Run 'echo 0 > /proc/sys/kernel/Info: 50.0469 MB of
memory in use based on /proc/self/stat
Info: Configuration file is cn054
FATAL ERROR: Unable to access config file cn054
[0] Stack Traceback:
  [0:0] CmiAbort+0x5c [0xbf56fa]
  [0:1] _Z8NAMD_diePKc+0x62 [0x535482]
  [0:2] _Z18after_backend_initiPPc+0x3d0 [0x539b90]
  [0:3] main+0x3a [0x53978a]
  [0:4] __libc_start_main+0xf4 [0x393301d994]
  [0:5] _ZNSt8ios_base4InitD1Ev+0x52 [0x534d7a]
[0] Stack Traceback:
  [0:0] /soft/nsengupta/NAMD_2.8_Linux-x86_64-ibverbs/namd2 [0xbf55b6]
  [0:1] CmiAbort+0x8e [0xbf572c]
  [0:2] _Z8NAMD_diePKc+0x62 [0x535482]
  [0:3] _Z18after_backend_initiPPc+0x3d0 [0x539b90]
  [0:4] main+0x3a [0x53978a]
  [0:5] __libc_start_main+0xf4 Info: Running on 1 processors, 1 nodes, 1
physical nodes.*

*etc.*

charmrun is apparently mis-interpreting the nodefile. Can we please get some
ideas as to how to solve this problem?

Thanks and regards,
Neelanjana Sengupta

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:37 CST