From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Wed Sep 20 2006 - 00:07:00 CDT
The shared processors sounds like the problem.
NCSA's Altix is set up to assign dedicated processors from the queueing
system and we get consistent performance. Their user docs are at
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/ and you can
probably have your sysadmin send them an email.
-Jim
On Mon, 18 Sep 2006, Alessandro Cembran wrote:
> Hi,
>
> I've been experiencing a problem running NAMD (with any of the versions
> 2.6b1, 2.6b2 and 2.6) on a 256 processors node altix 3700 BX2 machine
> (http://www.msi.umn.edu/altix/intro/).
> What happens is that with systems of different size (either ~55,000 or
> ~190,000 atoms) and with different number of processors (8 or 40), the
> performances of my calculations are not reproducible at all. In particular, a
> job might run extremely fast (i.e., almost linear scaling) for hours or days
> and all of a sudden its performances slow down to 10% or even ~2% of the peak
> performance and never recover.
> I talked with the systems manager here and he said that this is related to
> the architecture of the machine, because many jobs are competing for the
> network resources. In fact, I could track down that in some occasions the
> slow down arose when another "massively parallel" NAMD job started on the
> same node, and both of them then were running very slowly.
> So, I was wondering whether there is anything that could be done to make a
> better use of the altix architecture. In particular I was thinking if there
> is a way to reduce the message passing among the processors or tune it.
> Note: I always set the variables MPI_DSM_DISTRIBUTE
> I also set MPI_MEMMAP_OFF=1 because my jobs crashed after a while they were
> running because they ran put of memory. The following is a quote from the
> system manager:
>> Another NAMD user ran into a problem with respect to the amount of virtual
>> memory that was being allocated to NAMD by the operating system on the
>> 256-processor Altix node. It turns out that the Altix MPI is designed to
>> put huge memory maps into memory that speed up performance when running MPI
>> jobs that share memory between seperate Altix partitions (a feature we do
>> not use). When this other NAMD user would attempt to run large NAMD jobs,
>> they would segfault. If he set the MPI_MEMMAP_OFF environment variable,
>> his jobs no longer segfaulted.
> Thanks in advance,
>
> Alessandro
>
> --
> Alessandro Cembran,PhD
> Post Doctoral Associate
> Mailing Address:
> Univ. of Minnesota, Dept. of Chemistry
> G2, 139 Smith Hall 207 Pleasant St SE
> Minneapolis, MN 55455-0431
> Office:
> Univ. of Minnesota, Walter Library
> 117 Pleasant St SE, Room 473
> Phone: +1 612-624-4617
> E-mail: cembran_at_chem.umn.edu
>
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:01 CST