RE: Slow vacuum/GBIS simulations (time to completion keeps increasing)

From: Ole Juul Andersen (oja_at_chem.au.dk)
Date: Wed Jul 03 2013 - 05:06:46 CDT

The output frequency does not seem to be the problem. After running 2,600,000 steps (~10 hours) the largest file (the dcd file) is only 7.4 MB, and the time until completion has gone from 300 hours in the beginning of the simulation to currently 3,000 hours. There is plenty of memory left on the node, so it doesn't look like this is the bottleneck either.

Nevertheless, thank you for your suggestion!

/Ole

Ph.D. student Ole J. Andersen
iNANO, University of Aarhus
Department of Chemistry, Aarhus University
Langelandsgade 140,
Building 1510-421
8000 Aarhus C
Denmark
Tel.: +45 87 15 53 16 / +45 26 39 61 50
Mail: oja_at_chem.au.dk<mailto:oja_at_chem.au.dk>
________________________________
From: Aron Broom [broomsday_at_gmail.com]
Sent: Tuesday, July 02, 2013 8:10 PM
To: Ole Juul Andersen
Cc: namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion keeps increasing)

I guess the best quick check is to see how large the files from your failed ones ended up being before you killed them. I take your point that if the slow down happens quickly it seems like it would be something else.

On Tue, Jul 2, 2013 at 1:57 PM, Ole Juul Andersen <oja_at_chem.au.dk<mailto:oja_at_chem.au.dk>> wrote:
Woops, my bad. I usually use 1 fs time steps, forgot that the tutorial files use 2. You have a valid point, however, I don't think this is the actual cause of the problem. The simulations we performed on our own system had more reasonable output frequencies, and the increase in time to completion is observed very quickly after the simulations start. I have just submitted a job using 5000 as the output frequency to see if it helps =).

/Ole

Ph.D. student Ole J. Andersen
iNANO, University of Aarhus
Department of Chemistry, Aarhus University
Langelandsgade 140,
Building 1510-421
8000 Aarhus C
Denmark
Tel.: +45 87 15 53 16<tel:%2B45%2087%2015%2053%2016> / +45 26 39 61 50<tel:%2B45%2026%2039%2061%2050>
Mail: oja_at_chem.au.dk<mailto:oja_at_chem.au.dk>
________________________________
From: Aron Broom [broomsday_at_gmail.com<mailto:broomsday_at_gmail.com>]
Sent: Tuesday, July 02, 2013 7:45 PM
To: Ole Juul Andersen
Cc: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: Slow vacuum/GBIS simulations (time to completion keeps increasing)

Just to clarify, you realize the config file you posted is asking for 1000 ns of simulation time? That is really quite long. More importantly, your output frequencies are pretty low, so by the end you would have output 50 million energy lines! and 10 million DCD frames!! That is borderline insane. But it might not just be a question of sanity, at that magnitude, your slowdown may be because the files NAMD is writing to are becoming SO large that the I/O process is becoming the limiting factor.

I would set all the output frequencies to ~5000 (10 ps) and try it.

~Aron

On Tue, Jul 2, 2013 at 1:33 PM, Ole Juul Andersen <oja_at_chem.au.dk<mailto:oja_at_chem.au.dk>> wrote:
Dear all,

We would like to run implicit solvent simulations of a protein complex (9135 atoms) using the GBIS model implemented in NAMD. However, we experience that the simulations are VERY slow. We have therefore been searching for errors in the configuration file and in the submit script, using the mailing list archive, the NAMD tutorial, and the NAMD user guide as resources. As none of these sources gave a helpful answer (we might have missed it?), we turned to the example files provided with the implicit solvent tutorial. The pdb file of ubiquitin contains 1231 atoms, and if we start a production run of 500 ns on a node with 12 cores (using the tutorial conf file), the simulation is estimated to take 1,000 hours. If we simply switch GBIS from 'on' to 'off' in the conf file, the time drops to ~200 hours. For both of these simulations, we see the time to expected completion rise (the GBIS 'on' simulation currently has 3500 hours remaining). This is the exact same problem that we experienced while running GBIS simulations on our own system. At one point, the log file stated that the job had over 10,000 hours remaining for less than 500 ns of simulation time and running on 4 nodes!

The jobs are run using NAMD2.8, however, the problem also occurs when they are run using NAMD2.9 on GPUs. The structures seem rather stable throughout the simulations, and I do therefore not believe that the increase in time to completion arises from an increase in neighbouring atoms (supported by the fact the the GBIS 'off' simulations also show an increase in simulation time). We don't experience problems when using explicit solvation and PME, as the time to completion decreases at a steady rate for these simulations. Have any of you experienced similar problems, and if so, did you manage to find the reason, or even better, the solution? The submit script and configuration file is presented below.

Thank you.

All the best,
Ole

------------------------------------- SUBMIT SCRIPT -------------------------------------
#!/bin/sh
# Request resources - ppn is the number of processors requested per node
#PBS -q q12
#PBS -l nodes=1:ppn=12
#PBS -l walltime=24:00:00
#PBS -m abe
#PBS -N ubq_test
# Find additional options on
# http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
# and try "man qsub"

#Directory where input files are to be found
export JOB=/home/oja/GPU/namd-tutorial-files/1-4-gbis/grendel

#SCR is where the job is run (could be redundant)
export SCR=/scratch/$PBS_JOBID

#The programdirectory
export PROG=/com/namd/NAMD_2.8_Linux-x86_64-OpenMPI

#Copy all(!) the input files to the scratch directory
export name=ubq
export confname=ubq_gbis_eq

cp -p $JOB/par_all27_prot_lipid.inp $SCR/
cp -p $JOB/$confname.conf $SCR/
cp -p $JOB/$name.psf $SCR/
cp -p $JOB/$name.pdb $SCR/

#Enter the working directory
cd $SCR

######################
# RUN THE SIMULATION #
######################
source /com/OpenMPI/1.4.5/intel/bin/openmpi.sh

mpirun -bynode --mca btl self,openib $PROG/namd2 +setcpuaffinity $confname.conf > "$confname"_1.log
rsync -rlptDz $SCR/* $JOB/

#That's it

----------------------------------------------------------------------------------------------

---------------------------------------- CONF FILE ----------------------------------------

#############################################################
## JOB DESCRIPTION ##
#############################################################

# Minimization and Equilibration of
# Ubiquitin in generalized Born implicit solvent

#############################################################
## ADJUSTABLE PARAMETERS ##
#############################################################

structure ubq.psf
coordinates ubq.pdb

set temperature 310
set outputname ubq_gbis_eq

firsttimestep 0

#############################################################
## SIMULATION PARAMETERS ##
#############################################################

# Input
paraTypeCharmm on
parameters par_all27_prot_lipid.inp
temperature $temperature

# Implicit Solvent
gbis on
alphaCutoff 12.0
ionConcentration 0.3

# Force-Field Parameters
exclude scaled1-4
1-4scaling 1.0
cutoff 14.0
switching on
switchdist 13.0
pairlistdist 16.0

# Integrator Parameters
timestep 2.0 ;# 2fs/step
rigidBonds all ;# needed for 2fs steps
nonbondedFreq 1
fullElectFrequency 2
stepspercycle 10

# Constant Temperature Control
langevin on ;# do langevin dynamics
langevinDamping 1 ;# damping coefficient (gamma) of 1/ps
langevinTemp $temperature
langevinHydrogen off ;# don't couple langevin bath to hydrogens

# Output
outputName $outputname

restartfreq 500 ;# 500steps = every 1ps
dcdfreq 250
xstFreq 250
outputEnergies 100
outputPressure 100

#############################################################
## EXTRA PARAMETERS ##
#############################################################

#############################################################
## EXECUTION SCRIPT ##
#############################################################

# Minimization
minimize 100
reinitvels $temperature

run 500000000 ;# 5ps

----------------------------------------------------------------------------------------------

Ph.D. student Ole J. Andersen
iNANO, University of Aarhus
Department of Chemistry, Aarhus University
Langelandsgade 140,
Building 1510-421
8000 Aarhus C
Denmark
Tel.: +45 87 15 53 16<tel:%2B45%2087%2015%2053%2016> / +45 26 39 61 50<tel:%2B45%2026%2039%2061%2050>
Mail: oja_at_chem.au.dk<mailto:oja_at_chem.au.dk>

--
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo
--
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:25 CST