From: Alexander Balaeff (abalaeff_at_gmail.com)
Date: Wed Mar 01 2017 - 13:42:37 CST

John:

Thanks for following up! I will try the VMD 1.9.3 build.

Josh:

Sorry for disappearing from the conversation. Your idea sounds very
plausible! thanks a lot! Exceeding the max value of an internal
counter sounds like a possibility. The actual error messages I am
getting is that the job ran out of memory, not sure if it could be the
cause of exceeding the counter threshold or not:

slurmstepd: error: Job 555646 exceeded memory limit (2037876 >
2037760), being killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 555646 ON ec210 CANCELLED AT 2017-01-27T00:09:59 ***
slurmstepd: error: Exceeded step memory limit at some point.

Size/step wise, I am running with a stride of 10. The log file at the
moment of the latest crash contained 14 million lines, and had the
size 882 MB. A lot, but not quite the several billion required to
knock off an unsigned int counter. The other 5 output files contained
some 2.8 million lines each by the time of the crash and had the size
of around 80 MB. Could be hardware-specific of course, our cluster has
a mix of older and newer nodes.

In any case, I will try to run the script with the new VMD and let you
guys know how it went.

Thanks again to everyone,

Alexander.

On Fri, Jan 27, 2017 at 3:36 PM, Vermaas, Joshua
<Joshua.Vermaas_at_nrel.gov> wrote:
> Hi Alexander,
>
> What output does it make before it dies? From my reading of the script,
> this should totally work, so I'm very befuddled as to why there is a
> crash. Why is the logfile so large? Oh, now I see. The combination of
> the inner and outer loops means that each textfile will contain over 3
> billion lines. I suspect somewhere internal to Tcl the file-offset
> position is stored as an (unsigned?) integer, and Tcl explodes once the
> file size passes the 2^32 threshhold. Does it complete if you stride by
> 10 or 100? That should push it small enough that the total file sizes
> stay small enough.
>
> -Josh
>
> On 01/27/2017 01:11 PM, Alexander Balaeff wrote:
>> Another possibility is that what does VMD in is the long log file
>> created (~2GB by the time the script presumably finishes). If somehow
>> the standard "puts" leave blips in memory that add up over the course
>> of the script -- or create buffers that keep growing with the stdout
>> size -- that would be the cause of the crash. But I don't know the
>> internal works of TcL well enough to judge if that is the issue :(
>>
>>
>>
>> On Fri, Jan 27, 2017 at 1:59 PM, Alexander Balaeff <abalaeff_at_gmail.com> wrote:
>>> Ashar:
>>>
>>> Thanks for your comments. Yes, I have taken the basic precautions :)
>>> 1) Solvent deleted (and yet trajectory is long and not loadable in its
>>> entirety); 2) No, I am not creating atomselects within my loops; only
>>> 12 selections at the beginning which I keep using throughout the
>>> script; 3) yes, I am exactly loading the trajectory by smaller pieces
>>> = 1 frame at a time (more precisely, two frames: one into each created
>>> molecule).
>>>
>>> Yet, the end result is an apparent memory leak and crash.
>>>
>>> Again, I am aware of the solution of breaking the trajectory into,
>>> say, 10 pieces each of which would be readable all at once. Which will
>>> create 10x10 = 100 pieces of my RMSD map which will then need to be
>>> stitched together... doable, again. But I wanted to avoid all that
>>> hassle by simply keeping 1 frame of a trajectory in memory at a time.
>>> Which appears to result in a memory leak. Must have something to do
>>> with how VMD TcL engine deals with multiple reading from the files.
>>>
>>> Specifically: does 'mol addfile' *open the file anew and creating a
>>> new file TcL handle every time it is called* or keeps referring to the
>>> same handle once created? The former behavior would certainly result
>>> in the memory crash observed. If that is the case, would 'animate
>>> read' behave differently? If VMD gurus on the list could clarify these
>>> questions that would be very instructive and much appreciated :)
>>>
>>> Thanks a ton,
>>>
>>> Alexander.
>>>
>>> =====================================================
>>>
>>> Prof. Alexander Balaeff
>>>
>>> University of Central Florida
>>> NanoScience Technology Center
>>> 12424 Research Parkway, Suite 400
>>> Orlando, FL 32826
>>>
>>> Phone: (407) 823-4576
>>> FAX: (407) 882-2819
>>> E-mail: abalaeff_at_gmail.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jan 27, 2017 at 3:01 AM, Ashar Malik <asharjm_at_gmail.com> wrote:
>>>> Hi Alexander,
>>>>
>>>> I am not sure about the two lines of code you are using as in their
>>>> objective.
>>>>
>>>> From the post script what I get is that you couldn't load the entire
>>>> trajectory in one go because of memory issues.
>>>> A workaround to do that is to load lesser frames from your trajectory -- or
>>>> maybe think about deleting the solvent. If you don't load the solvent, you
>>>> trajectory should become manageble (unless its not tooooo long). Another
>>>> workaround is to load specific frames. e.g. if you trajectory has 100 frames
>>>> you can load every 3rd frame and end up with say 33 frames equally spaced
>>>> along the length of the trajectory. Usually consecutive frames are too
>>>> related anyway to infer useful information (my take). Loading frames with a
>>>> little gap can give you a workaround.
>>>>
>>>> If stripping the trajectory of waters works for you (assuming you system has
>>>> waters - or anything unnecessary) and your trajectory remains big - you can
>>>> then load frames at intervals.
>>>>
>>>> to jump frames -- use
>>>>
>>>> first ?? step ?? last ?? waitfor all 0
>>>>
>>>> to strip the trajectory of something -- use catdcd available here :
>>>> http://www.ks.uiuc.edu/Development/MDTools/catdcd/
>>>>
>>>> Looking at your code again -
>>>>
>>>> I wonder if you are making selections ... if you make selections in VMD
>>>> using atomselect - are you deleting those selections?
>>>> VMD/TCL has this thing where a selection keeps occupying the memory and
>>>> doesn't release it.
>>>>
>>>> to do this if you make a selection like
>>>>
>>>> set sel [atomselect top all]
>>>> --- do something with selection "sel" ---
>>>> $sel delete
>>>>
>>>> by using delete - you should be able to free the space. (just throwing this
>>>> in there if this is the issue).
>>>>
>>>> Hope one of these works for you. If it doesn't write back.
>>>>
>>>> Best,
>>>> /A
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 27, 2017 at 11:03 AM, Alexander Balaeff <abalaeff_at_gmail.com>
>>>> wrote:
>>>>> Dear VMD community:
>>>>>
>>>>> Could anyone possibly suggest a workaround of VMD running out of
>>>>> memory due to (suspected) multiple file opening? In the attached
>>>>> script, I am repeatedly reading frames from a DCD file(*) using mol
>>>>> addfile; basically, with this sequence repeated in two nested loops:
>>>>>
>>>>> animate delete all 0
>>>>> mol addfile $dcd_file(0) first $i last $i waitfor all 0
>>>>>
>>>>> End result: VMD (LINUXAMD64 version 1.9.2beta1) runs out of memory. I
>>>>> see nothing else in the script that could cause it and suspect the
>>>>> issue is VMD running afoul of the TcL file handling: opening the same
>>>>> file multiple times, growing some intrinsic buffers for file reading,
>>>>> etc., etc.
>>>>>
>>>>> Has anyone encountered similar issues? Would using animate read
>>>>> instead of mol addfile work better? Any suggestion would be greatly
>>>>> appreciated.
>>>>>
>>>>> Best,
>>>>>
>>>>> Alexander.
>>>>>
>>>>> (*) The reason I am doing so is that the whole DCD file wouldn't fit
>>>>> to the memory and I want to avoid reading it by pieces and then
>>>>> stitching the results together.
>>>>>
>>>>> =====================================================
>>>>>
>>>>> Prof. Alexander Balaeff
>>>>>
>>>>> University of Central Florida
>>>>> NanoScience Technology Center
>>>>> 12424 Research Parkway, Suite 400
>>>>> Orlando, FL 32826
>>>>>
>>>>> Phone: (407) 823-4576
>>>>> FAX: (407) 882-2819
>>>>> E-mail: abalaeff_at_gmail.com
>>>>
>>>>
>>>>
>>>> --
>>>> Best,
>>>> /A
>>
>