Re: Slow vacuum/GBIS simulations (time to completion keeps increasing)

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Jul 03 2013 - 06:19:36 CDT

On Wed, Jul 3, 2013 at 12:06 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
> The output frequency does not seem to be the problem. After running
> 2,600,000 steps (~10 hours) the largest file (the dcd file) is only 7.4 MB,
> and the time until completion has gone from 300 hours in the beginning of
> the simulation to currently 3,000 hours. There is plenty of memory left on
> the node, so it doesn't look like this is the bottleneck either.

have you looked at how the system has changed?

if you started with a very spread out structure and it collapses, then
there will be many more entries in the neighbor lists. for a system
with solvent, this makes no difference, but for a system without, this
can result in a big increase in the number of interactions to be
computed. remember that the realspace interactions scale O(N**2) with
the number of particles inside the cutoff.

axel.

>
> Nevertheless, thank you for your suggestion!
>
>
> /Ole
>
> Ph.D. student Ole J. Andersen
> iNANO, University of Aarhus
> Department of Chemistry, Aarhus University
> Langelandsgade 140,
> Building 1510-421
> 8000 Aarhus C
> Denmark
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
> Mail: oja_at_chem.au.dk
> ________________________________
> From: Aron Broom [broomsday_at_gmail.com]
> Sent: Tuesday, July 02, 2013 8:10 PM
>
> To: Ole Juul Andersen
> Cc: namd-l_at_ks.uiuc.edu
> Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion keeps
> increasing)
>
> I guess the best quick check is to see how large the files from your failed
> ones ended up being before you killed them. I take your point that if the
> slow down happens quickly it seems like it would be something else.
>
>
>
> On Tue, Jul 2, 2013 at 1:57 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
>>
>> Woops, my bad. I usually use 1 fs time steps, forgot that the tutorial
>> files use 2. You have a valid point, however, I don't think this is the
>> actual cause of the problem. The simulations we performed on our own system
>> had more reasonable output frequencies, and the increase in time to
>> completion is observed very quickly after the simulations start. I have just
>> submitted a job using 5000 as the output frequency to see if it helps =).
>>
>> /Ole
>>
>>
>> Ph.D. student Ole J. Andersen
>> iNANO, University of Aarhus
>> Department of Chemistry, Aarhus University
>> Langelandsgade 140,
>> Building 1510-421
>> 8000 Aarhus C
>> Denmark
>> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>> Mail: oja_at_chem.au.dk
>> ________________________________
>> From: Aron Broom [broomsday_at_gmail.com]
>> Sent: Tuesday, July 02, 2013 7:45 PM
>> To: Ole Juul Andersen
>> Cc: namd-l_at_ks.uiuc.edu
>> Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion
>> keeps increasing)
>>
>> Just to clarify, you realize the config file you posted is asking for 1000
>> ns of simulation time? That is really quite long. More importantly, your
>> output frequencies are pretty low, so by the end you would have output 50
>> million energy lines! and 10 million DCD frames!! That is borderline
>> insane. But it might not just be a question of sanity, at that magnitude,
>> your slowdown may be because the files NAMD is writing to are becoming SO
>> large that the I/O process is becoming the limiting factor.
>>
>> I would set all the output frequencies to ~5000 (10 ps) and try it.
>>
>> ~Aron
>>
>>
>> On Tue, Jul 2, 2013 at 1:33 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
>>>
>>> Dear all,
>>>
>>> We would like to run implicit solvent simulations of a protein complex
>>> (9135 atoms) using the GBIS model implemented in NAMD. However, we
>>> experience that the simulations are VERY slow. We have therefore been
>>> searching for errors in the configuration file and in the submit script,
>>> using the mailing list archive, the NAMD tutorial, and the NAMD user guide
>>> as resources. As none of these sources gave a helpful answer (we might have
>>> missed it?), we turned to the example files provided with the implicit
>>> solvent tutorial. The pdb file of ubiquitin contains 1231 atoms, and if we
>>> start a production run of 500 ns on a node with 12 cores (using the tutorial
>>> conf file), the simulation is estimated to take 1,000 hours. If we simply
>>> switch GBIS from 'on' to 'off' in the conf file, the time drops to ~200
>>> hours. For both of these simulations, we see the time to expected completion
>>> rise (the GBIS 'on' simulation currently has 3500 hours remaining). This is
>>> the exact same problem that we experienced while running GBIS simulations on
>>> our own system. At one point, the log file stated that the job had over
>>> 10,000 hours remaining for less than 500 ns of simulation time and running
>>> on 4 nodes!
>>>
>>> The jobs are run using NAMD2.8, however, the problem also occurs when
>>> they are run using NAMD2.9 on GPUs. The structures seem rather stable
>>> throughout the simulations, and I do therefore not believe that the increase
>>> in time to completion arises from an increase in neighbouring atoms
>>> (supported by the fact the the GBIS 'off' simulations also show an increase
>>> in simulation time). We don't experience problems when using explicit
>>> solvation and PME, as the time to completion decreases at a steady rate for
>>> these simulations. Have any of you experienced similar problems, and if so,
>>> did you manage to find the reason, or even better, the solution? The submit
>>> script and configuration file is presented below.
>>>
>>> Thank you.
>>>
>>> All the best,
>>> Ole
>>>
>>> ------------------------------------- SUBMIT SCRIPT
>>> -------------------------------------
>>> #!/bin/sh
>>> # Request resources - ppn is the number of processors requested per node
>>> #PBS -q q12
>>> #PBS -l nodes=1:ppn=12
>>> #PBS -l walltime=24:00:00
>>> #PBS -m abe
>>> #PBS -N ubq_test
>>> # Find additional options on
>>> #
>>> http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
>>> # and try "man qsub"
>>>
>>> #Directory where input files are to be found
>>> export JOB=/home/oja/GPU/namd-tutorial-files/1-4-gbis/grendel
>>>
>>> #SCR is where the job is run (could be redundant)
>>> export SCR=/scratch/$PBS_JOBID
>>>
>>> #The programdirectory
>>> export PROG=/com/namd/NAMD_2.8_Linux-x86_64-OpenMPI
>>>
>>> #Copy all(!) the input files to the scratch directory
>>> export name=ubq
>>> export confname=ubq_gbis_eq
>>>
>>> cp -p $JOB/par_all27_prot_lipid.inp $SCR/
>>> cp -p $JOB/$confname.conf $SCR/
>>> cp -p $JOB/$name.psf $SCR/
>>> cp -p $JOB/$name.pdb $SCR/
>>>
>>> #Enter the working directory
>>> cd $SCR
>>>
>>> ######################
>>> # RUN THE SIMULATION #
>>> ######################
>>> source /com/OpenMPI/1.4.5/intel/bin/openmpi.sh
>>>
>>> mpirun -bynode --mca btl self,openib $PROG/namd2 +setcpuaffinity
>>> $confname.conf > "$confname"_1.log
>>> rsync -rlptDz $SCR/* $JOB/
>>>
>>> #That's it
>>>
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>> ---------------------------------------- CONF FILE
>>> ----------------------------------------
>>>
>>>
>>> #############################################################
>>> ## JOB DESCRIPTION ##
>>> #############################################################
>>>
>>> # Minimization and Equilibration of
>>> # Ubiquitin in generalized Born implicit solvent
>>>
>>>
>>> #############################################################
>>> ## ADJUSTABLE PARAMETERS ##
>>> #############################################################
>>>
>>> structure ubq.psf
>>> coordinates ubq.pdb
>>>
>>> set temperature 310
>>> set outputname ubq_gbis_eq
>>>
>>> firsttimestep 0
>>>
>>>
>>> #############################################################
>>> ## SIMULATION PARAMETERS ##
>>> #############################################################
>>>
>>> # Input
>>> paraTypeCharmm on
>>> parameters par_all27_prot_lipid.inp
>>> temperature $temperature
>>>
>>> # Implicit Solvent
>>> gbis on
>>> alphaCutoff 12.0
>>> ionConcentration 0.3
>>>
>>> # Force-Field Parameters
>>> exclude scaled1-4
>>> 1-4scaling 1.0
>>> cutoff 14.0
>>> switching on
>>> switchdist 13.0
>>> pairlistdist 16.0
>>>
>>>
>>> # Integrator Parameters
>>> timestep 2.0 ;# 2fs/step
>>> rigidBonds all ;# needed for 2fs steps
>>> nonbondedFreq 1
>>> fullElectFrequency 2
>>> stepspercycle 10
>>>
>>>
>>> # Constant Temperature Control
>>> langevin on ;# do langevin dynamics
>>> langevinDamping 1 ;# damping coefficient (gamma) of 1/ps
>>> langevinTemp $temperature
>>> langevinHydrogen off ;# don't couple langevin bath to hydrogens
>>>
>>> # Output
>>> outputName $outputname
>>>
>>> restartfreq 500 ;# 500steps = every 1ps
>>> dcdfreq 250
>>> xstFreq 250
>>> outputEnergies 100
>>> outputPressure 100
>>>
>>>
>>> #############################################################
>>> ## EXTRA PARAMETERS ##
>>> #############################################################
>>>
>>>
>>> #############################################################
>>> ## EXECUTION SCRIPT ##
>>> #############################################################
>>>
>>> # Minimization
>>> minimize 100
>>> reinitvels $temperature
>>>
>>> run 500000000 ;# 5ps
>>>
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>> Ph.D. student Ole J. Andersen
>>> iNANO, University of Aarhus
>>> Department of Chemistry, Aarhus University
>>> Langelandsgade 140,
>>> Building 1510-421
>>> 8000 Aarhus C
>>> Denmark
>>> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>>> Mail: oja_at_chem.au.dk
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo

--
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:25 CST