Re: NAMD crashes with OpenMPI/OpenMX

From: Thomas Albers (talbers_at_binghamton.edu)
Date: Tue Jun 14 2011 - 13:04:29 CDT

Hello Jim,

> In this case pe 7 caught a segfault signal. There might be some useful
> information in the stack trace. Building a debug version would give you
> more information on where the segfault happened.

How is that done?

> Do you see an actual performance improvement from Open-MX?

Linux-x86_64, precompiled, 1 core/machine:
 TIMING: 500 CPU: 209.475, 0.336699/step Wall: 211.611, 0.338333/step
  WallClock: 215.350540

Open-MX, gcc 1 core/machine:
 TIMING: 500 CPU: 262.249, 0.471274/step Wall: 262.249, 0.471274/step,
 WallClock: 273.617798

Afraid not. I'm surprised to see that the performance penalty is that
high. One would predict that (Open)MX has a lower latency than TCP/IP,
so this result is quite stunning. Could that be the overhead from OpenMPI?

Something else - the notes say that charm++ can be built with Myrinet MX
support. When you do try
./build charm++ net-linux-x86_64 mx -j4 --with-production
this fails with

machine.c: At top level:
machine.c:2425: error: redefinition of 'CmiBarrier'
machine-mx.c:737: note: previous definition of 'CmiBarrier' was here
machine.c:2455: error: redefinition of 'CmiBarrierZero'
machine-mx.c:784: note: previous definition of 'CmiBarrierZero' was here
Fatal Error by charmc in directory
/home/ta/NAMD_2.8_Source/charm-6.3.2/net-linux-x86_64-mx/tmp

Is this a Charm++ bug?

> Does the simulation you are running scale better with an InfiniBand
> network?

I'm afraid that that is something we can't afford.

Thomas

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:26 CST