Re: Slow vacuum/GBIS simulations (time to completion keeps increasing)

From: Aron Broom (broomsday_at_gmail.com)
Date: Wed Jul 03 2013 - 07:40:54 CDT

Yeah, 4 ns/day seems pretty good for a system that size. Hence my surprise
initially when I saw you wanted hundreds of nanoseconds.

I haven't noticed such wild overestimates from the timing, even with
implicit solvent simulations of ~2k atom systems, but maybe the setups are
different.

On Wed, Jul 3, 2013 at 8:28 AM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:

> Axel, yes I have. Ubiquitin is rather globular, however, it does have a
> flexible terminal that can move around. It does in fact stick to the
> protein in the last part of the simulation. However, compared to the size
> of the rest of the system, the increased number of contacts does not seem
> to jusitfy a factor 10 increase in simulation time, but I might be
> mistaken.
>
> Aron, running the 9k system on a couple of nodes, I am actually getting
> 4.3 ns/day. I get this number by ignoring the timing statistics in the log
> file, and simply looking at how many hours it has taken to come to the
> current number of steps. This number does not fit with the timing
> statistics, so it seems that NAMD simply has a hard time estimating the
> time for smaller systems. This, in combination with my ignorant assumption
> of being able to produce more than 4 ns/day for a 9k system using an
> implicit solvation model seems to have gotten me confused =). Lesson
> learned!
>
> I thank you for your time…
>
> Best regards,
> Ole
>
> Ph.D. student Ole J. Andersen
> iNANO, University of Aarhus
> Department of Chemistry, Aarhus University
> Langelandsgade 140,
> Building 1510-421
> 8000 Aarhus C
> Denmark
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
> Mail: oja_at_chem.au.dk
>
> On Jul 3, 2013, at 1:19 PM, Axel Kohlmeyer wrote:
>
> On Wed, Jul 3, 2013 at 12:06 PM, Ole Juul Andersen <oja_at_chem.au.dk>
> wrote:
>
> The output frequency does not seem to be the problem. After running
>
> 2,600,000 steps (~10 hours) the largest file (the dcd file) is only 7.4 MB,
>
> and the time until completion has gone from 300 hours in the beginning of
>
> the simulation to currently 3,000 hours. There is plenty of memory left on
>
> the node, so it doesn't look like this is the bottleneck either.
>
>
> have you looked at how the system has changed?
>
> if you started with a very spread out structure and it collapses, then
> there will be many more entries in the neighbor lists. for a system
> with solvent, this makes no difference, but for a system without, this
> can result in a big increase in the number of interactions to be
> computed. remember that the realspace interactions scale O(N**2) with
> the number of particles inside the cutoff.
>
> axel.
>
>
>
> Nevertheless, thank you for your suggestion!
>
>
>
> /Ole
>
>
> Ph.D. student Ole J. Andersen
>
> iNANO, University of Aarhus
>
> Department of Chemistry, Aarhus University
>
> Langelandsgade 140,
>
> Building 1510-421
>
> 8000 Aarhus C
>
> Denmark
>
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>
> Mail: oja_at_chem.au.dk
>
> ________________________________
>
> From: Aron Broom [broomsday_at_gmail.com]
>
> Sent: Tuesday, July 02, 2013 8:10 PM
>
>
> To: Ole Juul Andersen
>
> Cc: namd-l_at_ks.uiuc.edu
>
> Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion keeps
>
> increasing)
>
>
> I guess the best quick check is to see how large the files from your
> failed
>
> ones ended up being before you killed them. I take your point that if the
>
> slow down happens quickly it seems like it would be something else.
>
>
>
>
> On Tue, Jul 2, 2013 at 1:57 PM, Ole Juul Andersen <oja_at_chem.au.dk> wrote:
>
>
> Woops, my bad. I usually use 1 fs time steps, forgot that the tutorial
>
> files use 2. You have a valid point, however, I don't think this is the
>
> actual cause of the problem. The simulations we performed on our own
> system
>
> had more reasonable output frequencies, and the increase in time to
>
> completion is observed very quickly after the simulations start. I have
> just
>
> submitted a job using 5000 as the output frequency to see if it helps =).
>
>
> /Ole
>
>
>
> Ph.D. student Ole J. Andersen
>
> iNANO, University of Aarhus
>
> Department of Chemistry, Aarhus University
>
> Langelandsgade 140,
>
> Building 1510-421
>
> 8000 Aarhus C
>
> Denmark
>
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>
> Mail: oja_at_chem.au.dk
>
> ________________________________
>
> From: Aron Broom [broomsday_at_gmail.com]
>
> Sent: Tuesday, July 02, 2013 7:45 PM
>
> To: Ole Juul Andersen
>
> Cc: namd-l_at_ks.uiuc.edu
>
> Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion
>
> keeps increasing)
>
>
> Just to clarify, you realize the config file you posted is asking for
> 1000
>
> ns of simulation time? That is really quite long. More importantly,
> your
>
> output frequencies are pretty low, so by the end you would have output 50
>
> million energy lines! and 10 million DCD frames!! That is borderline
>
> insane. But it might not just be a question of sanity, at that
> magnitude,
>
> your slowdown may be because the files NAMD is writing to are becoming SO
>
> large that the I/O process is becoming the limiting factor.
>
>
> I would set all the output frequencies to ~5000 (10 ps) and try it.
>
>
> ~Aron
>
>
>
> On Tue, Jul 2, 2013 at 1:33 PM, Ole Juul Andersen <oja_at_chem.au.dk>
> wrote:
>
>
> Dear all,
>
>
> We would like to run implicit solvent simulations of a protein complex
>
> (9135 atoms) using the GBIS model implemented in NAMD. However, we
>
> experience that the simulations are VERY slow. We have therefore been
>
> searching for errors in the configuration file and in the submit script,
>
> using the mailing list archive, the NAMD tutorial, and the NAMD user
> guide
>
> as resources. As none of these sources gave a helpful answer (we might
> have
>
> missed it?), we turned to the example files provided with the implicit
>
> solvent tutorial. The pdb file of ubiquitin contains 1231 atoms, and if
> we
>
> start a production run of 500 ns on a node with 12 cores (using the
> tutorial
>
> conf file), the simulation is estimated to take 1,000 hours. If we
> simply
>
> switch GBIS from 'on' to 'off' in the conf file, the time drops to ~200
>
> hours. For both of these simulations, we see the time to expected
> completion
>
> rise (the GBIS 'on' simulation currently has 3500 hours remaining).
> This is
>
> the exact same problem that we experienced while running GBIS
> simulations on
>
> our own system. At one point, the log file stated that the job had over
>
> 10,000 hours remaining for less than 500 ns of simulation time and
> running
>
> on 4 nodes!
>
>
> The jobs are run using NAMD2.8, however, the problem also occurs when
>
> they are run using NAMD2.9 on GPUs. The structures seem rather stable
>
> throughout the simulations, and I do therefore not believe that the
> increase
>
> in time to completion arises from an increase in neighbouring atoms
>
> (supported by the fact the the GBIS 'off' simulations also show an
> increase
>
> in simulation time). We don't experience problems when using explicit
>
> solvation and PME, as the time to completion decreases at a steady rate
> for
>
> these simulations. Have any of you experienced similar problems, and if
> so,
>
> did you manage to find the reason, or even better, the solution? The
> submit
>
> script and configuration file is presented below.
>
>
> Thank you.
>
>
> All the best,
>
> Ole
>
>
> ------------------------------------- SUBMIT SCRIPT
>
> -------------------------------------
>
> #!/bin/sh
>
> # Request resources - ppn is the number of processors requested per node
>
> #PBS -q q12
>
> #PBS -l nodes=1:ppn=12
>
> #PBS -l walltime=24:00:00
>
> #PBS -m abe
>
> #PBS -N ubq_test
>
> # Find additional options on
>
> #
>
>
> http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
>
> # and try "man qsub"
>
>
> #Directory where input files are to be found
>
> export JOB=/home/oja/GPU/namd-tutorial-files/1-4-gbis/grendel
>
>
> #SCR is where the job is run (could be redundant)
>
> export SCR=/scratch/$PBS_JOBID
>
>
> #The programdirectory
>
> export PROG=/com/namd/NAMD_2.8_Linux-x86_64-OpenMPI
>
>
> #Copy all(!) the input files to the scratch directory
>
> export name=ubq
>
> export confname=ubq_gbis_eq
>
>
> cp -p $JOB/par_all27_prot_lipid.inp $SCR/
>
> cp -p $JOB/$confname.conf $SCR/
>
> cp -p $JOB/$name.psf $SCR/
>
> cp -p $JOB/$name.pdb $SCR/
>
>
> #Enter the working directory
>
> cd $SCR
>
>
> ######################
>
> # RUN THE SIMULATION #
>
> ######################
>
> source /com/OpenMPI/1.4.5/intel/bin/openmpi.sh
>
>
> mpirun -bynode --mca btl self,openib $PROG/namd2 +setcpuaffinity
>
> $confname.conf > "$confname"_1.log
>
> rsync -rlptDz $SCR/* $JOB/
>
>
> #That's it
>
>
>
>
> ----------------------------------------------------------------------------------------------
>
>
> ---------------------------------------- CONF FILE
>
> ----------------------------------------
>
>
>
> #############################################################
>
> ## JOB DESCRIPTION ##
>
> #############################################################
>
>
> # Minimization and Equilibration of
>
> # Ubiquitin in generalized Born implicit solvent
>
>
>
> #############################################################
>
> ## ADJUSTABLE PARAMETERS ##
>
> #############################################################
>
>
> structure ubq.psf
>
> coordinates ubq.pdb
>
>
> set temperature 310
>
> set outputname ubq_gbis_eq
>
>
> firsttimestep 0
>
>
>
> #############################################################
>
> ## SIMULATION PARAMETERS ##
>
> #############################################################
>
>
> # Input
>
> paraTypeCharmm on
>
> parameters par_all27_prot_lipid.inp
>
> temperature $temperature
>
>
> # Implicit Solvent
>
> gbis on
>
> alphaCutoff 12.0
>
> ionConcentration 0.3
>
>
> # Force-Field Parameters
>
> exclude scaled1-4
>
> 1-4scaling 1.0
>
> cutoff 14.0
>
> switching on
>
> switchdist 13.0
>
> pairlistdist 16.0
>
>
>
> # Integrator Parameters
>
> timestep 2.0 ;# 2fs/step
>
> rigidBonds all ;# needed for 2fs steps
>
> nonbondedFreq 1
>
> fullElectFrequency 2
>
> stepspercycle 10
>
>
>
> # Constant Temperature Control
>
> langevin on ;# do langevin dynamics
>
> langevinDamping 1 ;# damping coefficient (gamma) of 1/ps
>
> langevinTemp $temperature
>
> langevinHydrogen off ;# don't couple langevin bath to hydrogens
>
>
> # Output
>
> outputName $outputname
>
>
> restartfreq 500 ;# 500steps = every 1ps
>
> dcdfreq 250
>
> xstFreq 250
>
> outputEnergies 100
>
> outputPressure 100
>
>
>
> #############################################################
>
> ## EXTRA PARAMETERS ##
>
> #############################################################
>
>
>
> #############################################################
>
> ## EXECUTION SCRIPT ##
>
> #############################################################
>
>
> # Minimization
>
> minimize 100
>
> reinitvels $temperature
>
>
> run 500000000 ;# 5ps
>
>
>
>
> ----------------------------------------------------------------------------------------------
>
>
> Ph.D. student Ole J. Andersen
>
> iNANO, University of Aarhus
>
> Department of Chemistry, Aarhus University
>
> Langelandsgade 140,
>
> Building 1510-421
>
> 8000 Aarhus C
>
> Denmark
>
> Tel.: +45 87 15 53 16 / +45 26 39 61 50
>
> Mail: oja_at_chem.au.dk
>
>
>
>
>
> --
>
> Aron Broom M.Sc
>
> PhD Student
>
> Department of Chemistry
>
> University of Waterloo
>
>
>
>
>
> --
>
> Aron Broom M.Sc
>
> PhD Student
>
> Department of Chemistry
>
> University of Waterloo
>
>
>
>
> --
> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> International Centre for Theoretical Physics, Trieste. Italy.
>
>
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:25 CST