Re: residue_rmsd.tcl segmentation fault

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Apr 04 2014 - 04:12:38 CDT

On Fri, Apr 4, 2014 at 4:18 AM, Norman Geist <norman.geist_at_uni-greifswald.de
> wrote:

>
>
> *Von:* Axel Kohlmeyer [mailto:akohlmey_at_gmail.com]
> *Gesendet:* Freitag, 4. April 2014 10:04
> *An:* Norman Geist
> *Cc:* John Xi; Namd Mailing List
> *Betreff:* Re: namd-l: residue_rmsd.tcl segmentation fault
>
>
>
>
>
>
>
> On Fri, Apr 4, 2014 at 3:53 AM, Norman Geist <
> norman.geist_at_uni-greifswald.de> wrote:
>
> Hi john,
>
>
>
> unfortunately you can't use swap like additional memory. The system will
> kill processes for being that hungry for memory as swap is only useful to
> swap out inactive pages. Additionally if the tcl script contains "exec",
> you got another problem as TCL does need to
>
>
>
> if you are willing to live with a massive performance drop, then you *can*
> use swap like RAM. if you put swap space on a RAID-0 made from SSDs, the
> performance drop is not that massive at all and when you have exploited the
> maximum RAM capacity of a mainboard, this may be the only option without
> having to go for an extremely expensive option.
>
>
>
> I've seen some OS to decide different. Even if swap isn't full it will
> kill processes trying to alloc memory under some conditions. And the bad
> is, it's doing it "race condition like" including all other running
> processes, not only the current memory consuming one. I've seen the OS to
> kill some background procs cause they were trying to alloc some memory,
> when there was no left due a large VMD.
>

out-of-memory behavior and use of swap are two different issues. also, one
has to distinguish between address space, virtual memory, and physical
memory. now the exact behavior of what happens depends on multiple
settings that can either be changed at run time or are set when the kernel
is compiled. by default the linux kernel does an optimistic memory
management and will allow infinite memory overcommitment, that is you can
reserve as much address space as the architecture allows, even if you
exceed virtual and physical memory. you can also configure the exact
opposite, i.e. to require that for every byte of reserved address space
there is a byte of virtual memory available. on top of that, certain
devices (e.g. GPUs) can require to be backed by immutable physical memory
only, and that messes things up some more. the sometimes erratic behavior
of the default OOM-killer in linux has been repeatedly a hotly debated
topic since the very early days of linux. multiple attempts to improve the
heuristics have been made, but the only really safe way to not run into it
is to disable overcommitment entirely. that in turn will indeed lead to the
situation that processes can run out of memory long before (virtual) memory
is exhausted, since many applications request address space, that they then
don't use.

the good thing about linux is, that you have a choice. other OSs handled
this similarly, e.g. IRIX 6.x had similar options to allow for how much
overcommitment would be allowed, which was often required to be enabled in
order to make old-style fortran (77 and older) programs with static memory
dimensioning work.

>
> fork() to "exec" which duplicates the memory of the parent process. You
> might want to reduce the number of frames, a stride of 2 can already half
> the memory usage.
>
>
>
> that is not quite true. on Linux allocated memory is generally flagged as
> copy-on-write, thus a fork will not cost you that much unless you modify
> that RAM.
>
>
>
> Open a VMD and load data until its using some more of the half of your
> memory. Now try "exec ls". It will fail due not enough memory, in fact with
> OpenSuse 13.1 and older it's doing so, even if considering swap would bring
> enough memory. So this proves both my comments. Maybe it's different in
> other distros. I've seen this problem on a machine having 128 GB of ram.
> Little more than 64 GB where used by VMD and I could't use STRIDE from VMD
> as it is called by "exec" and failes with a message like "not enough
> memory" in VMD console.
>

again, memory used (physical or virtual) and address space reserved are
different things. also, you may run out of stack space.

as far as empirical knowledge goes, it has the problem of being just that,
empirical. i do operate a machine that does use a striped SSD for swap and
regularly run into a scenario where that swap gets heavily used because
there is not enough RAM and the performance degradation is acceptable,
since this is an i/o bound operation doing i/o to a separate (spinning
disk) device.

axel.

>
>
> axel.
>
>
>
>
>
>
>
>
>
> PS: Are you sure that this question is NAMD related?
>
>
>
> Norman Geist.
>
>
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *John Xi
> *Gesendet:* Freitag, 4. April 2014 04:49
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: residue_rmsd.tcl segmentation fault
>
>
>
> Hi,
>
>
>
> I am getting segmentation fault running residue-rmsd.tcl script on a 5.2GB
> dcd file on a Linux box. After some googling, I feel it may be related to
> system memory issue as suggested previously . Our system has only 4GB of
> memory and 2GB of swap space. So 2GB more swap space was added to the
> system. And the output from free command is as following:
>
>
>
> total used free shared buffers
> cached
>
> Mem: 3838388 109476 3728912 0 2176 29400
>
> -/+ buffers/cache: 77900 3760488
>
> Swap: 4088204 263952 3824252
>
>
>
> So, about 7.5GB of memory could be available for the system.
>
>
>
> When the script was run, same problem Segmentation fault came out.
>
>
>
> To try to figure out what causes this problem, I monitored the memory
> usage of system during the reading of dcd file. When the memory was down
> to ~20MB, the swap space was called on. I did see the drop of swap space,
> but only by ~0.2GB, then the segmentation fault came out. The output from
> free command right before the problem is as following:
>
>
>
> total used free shared buffers
> cached
>
> Mem: 3838388 3817844 20544 0 712 25260
>
> -/+ buffers/cache: 3791872 46516
>
> Swap: 4088204 429500 3658704
>
>
>
> This seems to suggest the system works fine and the memory for this
> particular run should be enough (7.5GB vs 5.2GB). Given the exact same
> script works fine on a 3.2GB dcd file, I have no idea what could be wrong.
> Can somebody help me out?
>
>
>
> Thanks,
>
>
>
> John
>
>
>
>
> ------------------------------
>
> *Fehler! Es wurde kein Dateiname angegeben.* <http://www.avast.com/>
>
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus<http://www.avast.com/>Schutz ist aktiv.
>
>
>
>
>
>
>
> --
> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> College of Science & Technology, Temple University, Philadelphia PA, USA
> International Centre for Theoretical Physics, Trieste. Italy.
>
>
> ------------------------------
> <http://www.avast.com/>
>
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus<http://www.avast.com/>Schutz ist aktiv.
>
>

-- 
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:17 CST