Re: NAMD memory problems on ASC's SGI Altix machine

From: Margaret Kahn (Margaret.Kahn_at_anu.edu.au)
Date: Thu Jan 19 2006 - 17:25:47 CST

Sterling,

  I don't know if this is relevant but we have found on our SGI ALtix
that we need to set the environment variable MPI_MAPPED_STACK_SIZE
(and maybe even MPI_MAPPED_HEAP_SIZE) to a value such as 512000
rather than the default maximum stack size. This is because of the
way that charm++ does its memory mapping.

    Margaret

On 20/01/2006, at 3:46 AM, Sterling Paramore wrote:

> Hi, I'm having some trouble running NAMD on an SGI Altix machine.
> I'm using the precompiled binary from the website and I'm trying to
> run a 172,000 atom simultion on 128 processors (I tried compiling
> it myself, but it had the same problem and was 2x slower). When
> NAMD starts up, it says that it's using 14720 kB of memory.
> However, after about 130,000 steps, the job crashes and I get the
> following error from LSF,
>
> TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.
> Exited with exit code 143.
>
> Resource usage summary:
>
> CPU time :1205194.00 sec.
> Max Memory : 115208 MB
> Max Swap : -2097151 MB
>
> Max Processes : 129
> Max Threads : 129
>
> So the job actually ended up using 115GB of memory! Also, when I
> try to use a smaller number of processors, the job crashes earlier
> than 130,000 steps with a similar error (e.g., when I try 70
> processors, the job crashes after about 6000 steps). Any ideas?
>
> Thanks,
> Sterling

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:32 CST