namd-ibverbs fails to start

From: David Hemmendinger (hemmendd_at_union.edu)
Date: Fri Nov 18 2011 - 21:39:20 CST

        We have namd on an IBM cluster running xcat (RedHat Linux kernel
2.6.18-194), with Mellanox OFED 1.5.2-2.1.0. We've downloaded the
Linux-x86_64-ibverbs compiled version of namd 2.8 as well as others. The
TCP versions run, but the ibverbs version fails. Charmrun reports that
the program loads on the compute nodes, but then fails with the message:
CmiAbort("failed to change qp state to RTR").
        I've tried recompiling charmrun, using the current version
that I also obtained with the namd source. I specified only "ibverbs"
as the option to build, along with x86_64, but get the same error --
the compiler was gcc 4.1.2.
        Can anyone suggest what to fix? The only additional evidence
that I have is that when I try the ibverbs ibv_rc_pingpong test on the
compute nodes, I get the same error unless I specify an option not
documented in the man page for ibv_rc_pingpong: -g 0. Mellanox say
that this GID specification is a new option. Does that suggest that
something like it must be specified in running charmrun, and if so,
where should it be specified?
        (I also posted this query to the charm++ mailing list.)

        Thanks for any help!

  David Hemmendinger hemmendd at union.edu
  Professor Emeritus http://athena.union.edu/~hemmendd
  Computer Science Dept. +1 518 346 4489
  Union College, Schenectady, NY 12308 FAX: +1 518 388 6789

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:55 CST