RE: namd 2.7 iverbs error 93620 attaching to node

From: David A. Horita (dhorita_at_wfubmc.edu)
Date: Fri Dec 03 2010 - 11:42:46 CST

Hi,

I have the same issue (socket closed before recv) on some but not all of our infiniband nodes - one switch is fine, the other isn't. I'll get the 93620 error even running the charm-6.2.2 simplearrayhello test (charmrun hello) when charm is compiled with the ibverbs option (again, on one of the two ib switches). I realize that this isn't a charm/infiniband discussion group, but the charm++ faq ascribes this error as typically being a coding problem. Given that I can get this problem with the charm test programs, I'm thinking it's not a problem with the source.

So, without knowing the physical differences between the switches or how they are set up, does anyone have ideas that I can suggest to our sysadmin to try? Perftest 1.5 runs over MPI without generating errors, and the tcp version of charmrun/namd is fine (the mpi version that I've compiled is slow, but that may be my fault...), so I don't think it's as simple as something not being plugged in.

My complete details mirror Tom Bishop's, with the main difference being I get the 93620 error message before the Waiting for 0-th client to connect message, and never get beyond waiting for the 0-th client. The log file looks nearly identical to the log file I get when I submit the job to a set of nodes that doesn't have infiniband (differences being node/job ids).

Regards,
David

-----------------------------
David A. Horita, Ph.D.
Department of Biochemistry
Wake Forest University School of Medicine
Winston-Salem, NC 27157-1016
Tel: 336 713-4194
Fax: 336 716-7671
email: dhorita_at_wfubmc.edu
web: http://www1.wfubmc.edu/biochem/faculty/Horita.htm

-----Original Message-----
From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf Of Thomas C. Bishop
Sent: Tuesday, November 23, 2010 1:12 PM
To: namd-l_at_ks.uiuc.edu; Will Curry
Subject: namd-l: namd 2.7 iverbs error 93620 attaching to node

Greetings NAMD (v2.7 NAMD_2.7_Linux-x86_64-ibverbs.tar.gz )

I"m having some sort of config issue that seems related to charmrun connecton taht I'm hoping someone can help me solve.

I have other version namd running so the system and namd.config file are good but using NAMD_2.7_Linux-x86_64-ibverbs.tar.gz
I get this message from charmrun
 error 93620 attaching to node
so I never even get to namd

complete details below

Any assistance greatly aprpeciated.
TOm

compute-01-31 224% echo $CHARM
/scratch00/bishop/bin/NAMD_2.7_Linux-x86_64-ibverbs//charmrun
compute-01-31 225% echo $NAMD
/scratch00/bishop/bin/NAMD_2.7_Linux-x86_64-ibverbs//namd2

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:50 CST