From: David Hemmendinger (hemmendd_at_union.edu)
Date: Fri Nov 18 2011 - 21:39:20 CST
We have namd on an IBM cluster running xcat (RedHat Linux kernel
2.6.18-194), with Mellanox OFED 1.5.2-2.1.0. We've downloaded the
Linux-x86_64-ibverbs compiled version of namd 2.8 as well as others. The
TCP versions run, but the ibverbs version fails. Charmrun reports that
the program loads on the compute nodes, but then fails with the message:
CmiAbort("failed to change qp state to RTR").
I've tried recompiling charmrun, using the current version
that I also obtained with the namd source. I specified only "ibverbs"
as the option to build, along with x86_64, but get the same error --
the compiler was gcc 4.1.2.
Can anyone suggest what to fix? The only additional evidence
that I have is that when I try the ibverbs ibv_rc_pingpong test on the
compute nodes, I get the same error unless I specify an option not
documented in the man page for ibv_rc_pingpong: -g 0. Mellanox say
that this GID specification is a new option. Does that suggest that
something like it must be specified in running charmrun, and if so,
where should it be specified?
(I also posted this query to the charm++ mailing list.)
Thanks for any help!
David Hemmendinger hemmendd at union.edu
Professor Emeritus http://athena.union.edu/~hemmendd
Computer Science Dept. +1 518 346 4489
Union College, Schenectady, NY 12308 FAX: +1 518 388 6789
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:59 CST