Re: NAMD on Infiniband - problems and successes

From: Jimmy Tang (jtang_at_tchpc.tcd.ie)
Date: Thu Apr 05 2007 - 12:19:24 CDT

Hi Martin,

On Sun, Mar 25, 2007 at 11:30:57AM -0600, Martin Cuma wrote:
> Hello everybody,
>
> I figured I'll share my experiences building NAMD on our new cluster, and,
> ask few questions in the process.
>
> Our cluster is Opteron 2.4 GHz dual-core dual with Cisco (Topspin) IB
> cards (SDR), running RHEL 4.0 and OpenIB 1.1.
>
> I got both the 2.6 NAMD and 5.9 charm++ release, and also the latest CVS
> of both.
>
> I found the NAMD Wiki post in Infiniband and followed that to build it.
> FIRST with MVAPICH 0.9.8. It built and ran, but, the load balancing seems
> to be messed up due (seems like) wrong wall time reporting, e.g.:
> ....
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> Info: Initial time: 12 CPUs 0.773921 s/step 4.47871 days/ns 89471 kB
> memory
> LDB: LOAD: AVG 307114 MAX 1.78672e+06 MSGS: TOTAL 257 MAXC 25 MAXP 5
> None
> Info: Adjusted background load on 5 nodes.
> Warning: overload assign to 0
> LDB: LOAD: AVG 307117 MAX 446680 MSGS: TOTAL 298 MAXC 30 MAXP 6 Alg7
> LDB: LOAD: AVG 307117 MAX 446680 MSGS: TOTAL 298 MAXC 30 MAXP 6 Alg7
> Info: Initial time: 12 CPUs 1117.6 s/step 6467.59 days/ns 107876 kB memory
> LDB: LOAD: AVG 23974 MAX 335022 MSGS: TOTAL 298 MAXC 30 MAXP 6 None
> Warning: 3 processors are overloaded due to high background load.
> Info: ERROR: Could not refine at max overload
> LDB: LOAD: AVG 23974 MAX 558337 MSGS: TOTAL 298 MAXC 30 MAXP 6 Refine
> Info: Initial time: 12 CPUs -1115.93 s/step -6457.94 days/ns 107811 kB
> memory
> LDB: LOAD: AVG 27954.8 MAX 223368 MSGS: TOTAL 298 MAXC 30 MAXP 6 None
> Warning: 3 processors are overloaded due to high background load.
> ....
>

some of our users are currently running with mvapich 0.9.5 (and a 0.9.8)
with the voltaire infiniband hca's and drivers and we havent noticed
the above errors. but we are running with single core opteron cpus.

> Notice the negative time per step in 2nd report, and, exceedingly large
> time in 1st.
>
> First suspicion was cpuspeed. I turned it off, problem persists. Then I
> ran the MPICH Ethernet build on the same machine - that ran fine.
>
> So, my conclusion on this one is that there's some trouble with MVAPICH
> that skewes the timing. Few questions on that. First, did anybody else see
> this? Second - what timers does NAMD use?
>
> My SECOND attempt was to build charm++ using the net-ibvers interface.
> Just changed the path to the OpenIB libraries.
> That also built fine, but, NAMD in parallel gets stuck here:
> ...
> Info: *****************************
> Info: Entering startup phase 0 with 63320 kB of memory in use.

i think there were some previous posts about this, i certainly had come
across a problem similar to the above before, but it just disappeared
when i read up on the pathscale website on compiling namd and used their
recommended options (pathscale breaks charm a bit with higher levels
of optimisation).

> The charm++ test programs seem to be stuck on communication, too. It
> almost looks like a deadlock. Any pointers what to do here? Anybody else
> running net-ibvers on similar setup (IB drivers) like us? My experience
> with charm++ is rather limited, and, I can't find any documentation on
> the ibvers part. It'd be good to get the ibvers version working in hopes
> for better performance.
>
> The THIRD attempt is a success story - charm++ and NAMD built fine with
> OpenMPI 1.2, and, it runs fine too, with good scaling (compared to the
> posted benchmarks on NAMD website).

since we dont use openmpi at our site or have dual/multi-core cpu's we
havent looked at libnuma which openmpi supports, it might be worth you
looking at it, as it can apparently pin processes to a core and or
memory so your processes dont cross memory boundaries on the hardware.

out of curiosity do you reliably get consistant run times for the same
job over the same number of nodes or does the run time vary -/+ X% for
say 10 runs of the same job on your dual system?

> To conclude, I'd like to ask one more question - can those who run on
> Infiniband reply and say what type of MPI and IB drivers they use? That'd
> be useful to get a good picture on the NAMD status on IB.
>

At our site (tested with and seems to work reliably),

        MPI - mvapich 0.9.5, 0.9.7 (with one or two fixes) and 0.9.8
        Compiler - pathscale 2.2.1, 2.4 and 2.5
        Infiniband stack - Voltaire HCA's and the 3.5.0-15 drivers

We've been testing out OFED (seems to give much better performance with
a numbers of things that we use at our site) and will probably be moving
over to OFED since the ipoib and sdp performance is much greater than
the 3.5.0 stack from voltaire that we are using. though we have not yet
tested namd or any other computational codes just the usual network
benchmarks like pallas and netpipe.

Jimmy.

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | http://www.tchpc.tcd.ie/~jtang

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:32 CST