NAMD and NUMA

From: Joseph Farran (jfarran_at_uci.edu)
Date: Wed May 07 2014 - 13:54:51 CDT

Next message: Per Larsson: "Setting up membrane protein simulation from existing conformation?"
Previous message: Kenno Vanommeslaeghe: "Re: Topology for lysine bouding in Epsilon"
Next in thread: Norman Geist: "AW: NAMD and NUMA"
Reply: Norman Geist: "AW: NAMD and NUMA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi All / NAMD support.

We are running NAMD 2.9 on CentoOS 6.5 with Berkeley checkpoint and jobs
checkpoint and start-up just fine, however, when the job re-starts on
another node, the time to finish increases 2x to 3x:

TIMING: 16000 CPU: 668.71, 0.0411388/step Wall: 668.71,
0.0411388/step, 5.53088 hours remaining, 4338.894531 MB of memory in use.
TIMING: 17000 CPU: 710.398, 0.0416875/step Wall: 710.398,
0.0416875/step, 5.59307 hours remaining, 4338.894531 MB of memory in use.

TIMING: 18000 CPU: 817.05, 0.106652/step Wall: 817.05, 0.106652/step,
14.2795 hours remaining, 4338.894531 MB of memory in use.
TIMING: 19000 CPU: 943.168, 0.126118/step Wall: 943.168,
0.126118/step, 16.8507 hours remaining, 4338.894531 MB of memory in use.

The issue seems to be with memory allocation. When the job re-starts
on a different but similar node, memory allocation is lost.

Anyone knows how to save the current memory allocation and be able to
restore it with Linux numactl?

Thanks,
Joseph

Next message: Per Larsson: "Setting up membrane protein simulation from existing conformation?"
Previous message: Kenno Vanommeslaeghe: "Re: Topology for lysine bouding in Epsilon"
Next in thread: Norman Geist: "AW: NAMD and NUMA"
Reply: Norman Geist: "AW: NAMD and NUMA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:20:46 CST