Re: Slow vacuum/GBIS simulations (time to completion keeps increasing)

From: Aron Broom (broomsday_at_gmail.com)
Date: Wed Jul 03 2013 - 06:12:30 CDT

what kind of speed are you getting in the end anyway? I'm just wondering
if it's possible that the early estimates are somehow flawed. For your 9k
atom system, I'd be surprised if you were getting much more than 1-2ns/day.

On Wed, Jul 3, 2013 at 6:06 AM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:

> The output frequency does not seem to be the problem. After running
> 2,600,000 steps (~10 hours) the largest file (the dcd file) is only 7.4 MB,
> and the time until completion has gone from 300 hours in the beginning of
> the simulation to currently 3,000 hours. There is plenty of memory left on
> the node, so it doesn't look like this is the bottleneck either.
>
> Nevertheless, thank you for your suggestion!
>
>
> /Ole
>
> Ph.D. student Ole J. Andersen
> iNANO, University of Aarhus
> Department of Chemistry, Aarhus University
> Langelandsgade 140,
> Building 1510-421
> 8000 Aarhus C
> Denmark
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
> Mail: oja_at_chem.au.dk
> ------------------------------
> *From:* Aron Broom [broomsday_at_gmail.com]
> *Sent:* Tuesday, July 02, 2013 8:10 PM
>
> *To:* Ole Juul Andersen
> *Cc:* namd-l_at_ks.uiuc.edu
> *Subject:* Re: namd-l: Slow vacuum/GBIS simulations (time to completion
> keeps increasing)
>
> I guess the best quick check is to see how large the files from your
> failed ones ended up being before you killed them. I take your point that
> if the slow down happens quickly it seems like it would be something else.
>
>
>
> On Tue, Jul 2, 2013 at 1:57 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
>
>> Woops, my bad. I usually use 1 fs time steps, forgot that the tutorial
>> files use 2. You have a valid point, however, I don't think this is the
>> actual cause of the problem. The simulations we performed on our own system
>> had more reasonable output frequencies, and the increase in time to
>> completion is observed very quickly after the simulations start. I have
>> just submitted a job using 5000 as the output frequency to see if it helps
>> =).
>>
>> /Ole
>>
>>
>> Ph.D. student Ole J. Andersen
>> iNANO, University of Aarhus
>> Department of Chemistry, Aarhus University
>> Langelandsgade 140,
>> Building 1510-421
>> 8000 Aarhus C
>> Denmark
>> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>> Mail: oja_at_chem.au.dk
>> ------------------------------
>> *From:* Aron Broom [broomsday_at_gmail.com]
>> *Sent:* Tuesday, July 02, 2013 7:45 PM
>> *To:* Ole Juul Andersen
>> *Cc:* namd-l_at_ks.uiuc.edu
>> *Subject:* Re: namd-l: Slow vacuum/GBIS simulations (time to completion
>> keeps increasing)
>>
>> Just to clarify, you realize the config file you posted is asking
>> for 1000 ns of simulation time? That is really quite long. More
>> importantly, your output frequencies are pretty low, so by the end you
>> would have output 50 million energy lines! and 10 million DCD frames!!
>> That is borderline insane. But it might not just be a question of sanity,
>> at that magnitude, your slowdown may be because the files NAMD is writing
>> to are becoming SO large that the I/O process is becoming the limiting
>> factor.
>>
>> I would set all the output frequencies to ~5000 (10 ps) and try it.
>>
>> ~Aron
>>
>>
>> On Tue, Jul 2, 2013 at 1:33 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
>>
>>> Dear all,
>>>
>>> We would like to run implicit solvent simulations of a protein complex
>>> (9135 atoms) using the GBIS model implemented in NAMD. However, we
>>> experience that the simulations are VERY slow. We have therefore been
>>> searching for errors in the configuration file and in the submit script,
>>> using the mailing list archive, the NAMD tutorial, and the NAMD user guide
>>> as resources. As none of these sources gave a helpful answer (we might have
>>> missed it?), we turned to the example files provided with the implicit
>>> solvent tutorial. The pdb file of ubiquitin contains 1231 atoms, and if we
>>> start a production run of 500 ns on a node with 12 cores (using the
>>> tutorial conf file), the simulation is estimated to take 1,000 hours. If we
>>> simply switch GBIS from 'on' to 'off' in the conf file, the time drops to
>>> ~200 hours. For both of these simulations, we see the time to expected
>>> completion rise (the GBIS 'on' simulation currently has 3500 hours
>>> remaining). This is the exact same problem that we experienced while
>>> running GBIS simulations on our own system. At one point, the log file
>>> stated that the job had over 10,000 hours remaining for less than 500 ns of
>>> simulation time and running on 4 nodes!
>>>
>>> The jobs are run using NAMD2.8, however, the problem also occurs when
>>> they are run using NAMD2.9 on GPUs. The structures seem rather stable
>>> throughout the simulations, and I do therefore not believe that the
>>> increase in time to completion arises from an increase in neighbouring
>>> atoms (supported by the fact the the GBIS 'off' simulations also show an
>>> increase in simulation time). We don't experience problems when using
>>> explicit solvation and PME, as the time to completion decreases at a steady
>>> rate for these simulations. Have any of you experienced similar problems,
>>> and if so, did you manage to find the reason, or even better, the solution?
>>> The submit script and configuration file is presented below.
>>>
>>> Thank you.
>>>
>>> All the best,
>>> Ole
>>>
>>> ------------------------------------- SUBMIT SCRIPT
>>> -------------------------------------
>>> #!/bin/sh
>>> # Request resources - ppn is the number of processors requested per node
>>> #PBS -q q12
>>> #PBS -l nodes=1:ppn=12
>>> #PBS -l walltime=24:00:00
>>> #PBS -m abe
>>> #PBS -N ubq_test
>>> # Find additional options on
>>> #
>>> http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
>>> # and try "man qsub"
>>>
>>> #Directory where input files are to be found
>>> export JOB=/home/oja/GPU/namd-tutorial-files/1-4-gbis/grendel
>>>
>>> #SCR is where the job is run (could be redundant)
>>> export SCR=/scratch/$PBS_JOBID
>>>
>>> #The programdirectory
>>> export PROG=/com/namd/NAMD_2.8_Linux-x86_64-OpenMPI
>>>
>>> #Copy all(!) the input files to the scratch directory
>>> export name=ubq
>>> export confname=ubq_gbis_eq
>>>
>>> cp -p $JOB/par_all27_prot_lipid.inp $SCR/
>>> cp -p $JOB/$confname.conf $SCR/
>>> cp -p $JOB/$name.psf $SCR/
>>> cp -p $JOB/$name.pdb $SCR/
>>>
>>> #Enter the working directory
>>> cd $SCR
>>>
>>> ######################
>>> # RUN THE SIMULATION #
>>> ######################
>>> source /com/OpenMPI/1.4.5/intel/bin/openmpi.sh
>>>
>>> mpirun -bynode --mca btl self,openib $PROG/namd2 +setcpuaffinity
>>> $confname.conf > "$confname"_1.log
>>> rsync -rlptDz $SCR/* $JOB/
>>>
>>> #That's it
>>>
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>> ---------------------------------------- CONF FILE
>>> ----------------------------------------
>>>
>>>
>>> #############################################################
>>> ## JOB DESCRIPTION ##
>>> #############################################################
>>>
>>> # Minimization and Equilibration of
>>> # Ubiquitin in generalized Born implicit solvent
>>>
>>>
>>> #############################################################
>>> ## ADJUSTABLE PARAMETERS ##
>>> #############################################################
>>>
>>> structure ubq.psf
>>> coordinates ubq.pdb
>>>
>>> set temperature 310
>>> set outputname ubq_gbis_eq
>>>
>>> firsttimestep 0
>>>
>>>
>>> #############################################################
>>> ## SIMULATION PARAMETERS ##
>>> #############################################################
>>>
>>> # Input
>>> paraTypeCharmm on
>>> parameters par_all27_prot_lipid.inp
>>> temperature $temperature
>>>
>>> # Implicit Solvent
>>> gbis on
>>> alphaCutoff 12.0
>>> ionConcentration 0.3
>>>
>>> # Force-Field Parameters
>>> exclude scaled1-4
>>> 1-4scaling 1.0
>>> cutoff 14.0
>>> switching on
>>> switchdist 13.0
>>> pairlistdist 16.0
>>>
>>>
>>> # Integrator Parameters
>>> timestep 2.0 ;# 2fs/step
>>> rigidBonds all ;# needed for 2fs steps
>>> nonbondedFreq 1
>>> fullElectFrequency 2
>>> stepspercycle 10
>>>
>>>
>>> # Constant Temperature Control
>>> langevin on ;# do langevin dynamics
>>> langevinDamping 1 ;# damping coefficient (gamma) of 1/ps
>>> langevinTemp $temperature
>>> langevinHydrogen off ;# don't couple langevin bath to hydrogens
>>>
>>> # Output
>>> outputName $outputname
>>>
>>> restartfreq 500 ;# 500steps = every 1ps
>>> dcdfreq 250
>>> xstFreq 250
>>> outputEnergies 100
>>> outputPressure 100
>>>
>>>
>>> #############################################################
>>> ## EXTRA PARAMETERS ##
>>> #############################################################
>>>
>>>
>>> #############################################################
>>> ## EXECUTION SCRIPT ##
>>> #############################################################
>>>
>>> # Minimization
>>> minimize 100
>>> reinitvels $temperature
>>>
>>> run 500000000 ;# 5ps
>>>
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>> Ph.D. student Ole J. Andersen
>>> iNANO, University of Aarhus
>>> Department of Chemistry, Aarhus University
>>> Langelandsgade 140,
>>> Building 1510-421
>>> 8000 Aarhus C
>>> Denmark
>>> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>>> Mail: oja_at_chem.au.dk
>>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:23 CST