Re: unable to open restart.colvars

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Fri Jan 04 2019 - 13:10:26 CST

Hi Francesco, under ideal performance conditions NAMD (or similar MD codes,
for that matter) will execute a single step in a few milliseconds. With
restartFreq in the thousands, you are requesting every few seconds to carry
out several file operations that are inter-dependent. On a busy file
system, this is bound to fail at some point.

You should try to keep restartFreq in the tens of thousands or more. Your
overall performance will also improve, because NAMD will hang less often
while waiting for file writes to complete.

Giacomo

On Fri, Jan 4, 2019 at 12:16 PM Francesco Pietra <chiendarret_at_gmail.com>
wrote:

> Hi Giacomo:
> writing to disk was not so frequent as at the time you are referring to:
>
> outputEnergies 2000 # multiple of fullElectFrequency or viceversa
> restartfreq 2000
> DCDfreq 5000
>
> Nonetheless I have now increased it
> outputEnergies 2000 # multiple of fullElectFrequency or viceversa
> restartfreq 3000
> DCDfreq 5000
>
> and the simulation came to end without problems.
>
> Thanks for your advice
> francesco
>
>
> On Thu, Jan 3, 2019 at 8:58 PM Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> wrote:
>
>> Hi Francesco, a workaround was introduced in 2.10 into the file writing
>> routines of NAMD to support the file system on Blue Waters. This created a
>> few related problems, but all were fixed eventually. Since you state that
>> you used the nightly build, I don't think you're affected by any of those
>> old bugs.
>>
>> A search for that error message in this mailing list returns several
>> messages from you (e.g. last April), when writing the NAMD restart file.
>> Brian wrote the following then:
>>
>> *How frequently are you writing to disk? It looks like you are writing a
>> restart file every 500 steps, which is incredibly frequent and stresses
>> both NAMD performance and the disk. Even writing energies to output that
>> frequently can measurably slow down a simulation.*
>>
>> So obviously the writing frequency is the first that I would check.
>> Remember that on a busy network file systems, some operations will not be
>> completed within seconds (e.g. renaming files to .BAK).
>>
>> Giacomo
>>
>> On Thu, Jan 3, 2019 at 2:21 PM Victor Kwan <vkwan8_at_uwo.ca> wrote:
>>
>>> Does it occur over and over again? Could be I/O issue with your scratch
>>> disk.
>>>
>>> On Mon, Dec 31, 2018 at 2:47 AM Francesco Pietra <chiendarret_at_gmail.com>
>>> wrote:
>>> >
>>> > Hi all:
>>> > During npt equilibration with restraints (single node of a cluster
>>> with 36 core and four tesla gpus, namd night build) the simulation crashed
>>> with
>>> >
>>> > colvars: Writing the state file "./npt_restr-02.restart.colvars.state".
>>> > FATAL ERROR: Unable to open text file
>>> ./npt_restr-02.restart.colvars.state: File
>>> > exists
>>> >
>>> > These are three distance colvars for both ligands.
>>> >
>>> > This problem did not occur with npt_restr-01. Also, it did not occur
>>> with npt_restr-02 in a series of related simulations (same receptor and
>>> ligand, with the latter in a different pose).
>>> >
>>> > I understand that I am providing little info, but I was unable to find
>>> any error in the input, or previous, files.
>>> >
>>> > Thanks for any suggestion
>>> > francesco pietra
>>> >
>>>
>>>
>>
>> --
>> Giacomo Fiorin
>> Associate Professor of Research, Temple University, Philadelphia, PA
>> Contractor, National Institutes of Health, Bethesda, MD
>> http://goo.gl/Q3TBQU
>> https://github.com/giacomofiorin
>>
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:09 CST