Problem with NAMD on Opteron cluster

From: Harald Tepper (h.l.tepper_at_amolf.nl)
Date: Sat Nov 12 2005 - 05:29:33 CST

Dear NAMD users/developers,

We experience a lot of problems with compiling/running NAMD (2.6) on our
recently installed Opteron cluster (dual processor nodes type 248 and
GigaBit interconnect). Any help would be appreciated.
Maybe it is good to say that our problems seem related to the posts by
Marc Ma (Sep 01/2005) and Ralph Jimenez (Sep 29/2005) but we found no
responses there that would completely solve our problems.

Here is a summary of our start situation:
*) we have only Gnu and Portland compilers available, and were able to
compile charm and namd only with the GNU ones.
*) We have both compiled versions with charmrun and with MPI.
*) Results with both are similar and also similar to downloaded
precompiled binaries (AMD / 64 / TCP)

Here are the problems:
*) The code runs more or less fine on 2 processors (one node), although
CPU time seems not to be used to a maximum: when doing 'top' we see proc.
1 using 99% in 'user' mode and proc. 2 using 60-80% in 'sys' mode. *) This
only gets worse on 4 and more processors: more and more time is spent in
'sys' mode on many processors and some look even totally dead. These
findings seems similar to Dr. Ma's posting.
*) The performance gets also worse over time, basically after the first
and later 'load balancing' statements in the output file. Here we see
behavior similar to Dr. Jimenez, namely 'negative timing values'. The one
response to that previous posting says maybe there is just something wrong
with the local timer. I would really hope the solution is that simple. Can
any one give suggestions on how to find this out and/or where to specify
the timer during the installation process?

Thanks in advance for any help.

Harald Tepper
Amsterdam

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:09 CST