GPU/CPU workstation optimal simulation parameters for NAMD2.9 performance

From: Pino, James Christopher (james.c.pino_at_vanderbilt.edu)
Date: Tue Feb 18 2014 - 12:00:34 CST

Hello,
I have a question about NAMD performance on GPU/CPU workstation. I am using Ubunto 12.04.
The station has two Keplar K20c and a Quadro K5000 and two Intel Xeon E5 v2/cores, 8 core double threaded, 32 cpu.I have a binary multi-core NAMD 2.9 CUDA.CUDA-5.5
As for my system at study, it is a lipid bilayer containing ~50k lipid atoms, with tip3waters and ions for a total of 189225 atoms. I am running NPT using Nose-Hoover langevin barostat
I have read through the mailing list multiple times, explored various resources, and spent many days varying a multitude of parameters.
I have improved the performance from 0.241213 days/ns to 0.154686 days/ns over the past 4 weeks.
A few things I have observed
+p odd # is always faster, I assume this is due to freeing one ldbUnloadZero yes ?
+devices 0,2 works best with a combination of 3 to 5 *(0,2)
using charmrun doesn't increase speed, in fact decreases slightly

I have rans 100s of benchmarks, using multiple combinations of

twoAwayX (Y,Z)
PMEProcessors
+p ( all the way from 1-32)
+p with +ppn
with 0 , 1 or 2 gpus, with all combinations CPUs

The best combination (so far) (file 10ns.namd)
###########################################################
set steps 5000000
set tstep 2.0
set stepscycle 21
set temp 310
set pres 1.01325
set dcdfrq 1000
set rstfrq 5000
set xstfrq 1000
set enefrq 960
set prsfrq 1000
set timfrq 1000

set dcdout "Prod/Prod/Prod-70-80ns/mito_equil_prod_70_80ns.dcd"
set rstout "Prod/Prod/Prod-70-80ns/equil_prod_70_80ns"
set prfxout "Prod/Prod/Prod-70-80ns/equil_prod_70_80ns"
set inputname "Prod/Prod/Prod-60-70ns/equil_prod_60_70ns"
 
ldbUnloadZero yes
twoAwayX no
twoAwayY no
twoAwayZ no

temperature $temp
paraTypeCharmm on
parameters par_all36_lipid.prm
structure mem_solv_ions.psf
coordinates mem_solv_ions.pdb

binCoordinates $inputname.coor ;
extendedSystem $inputname.xsc ;
 
# NAMD OUTPUT
outputName $prfxout ;# defined above
restartname $rstout ;# restart prefix
restartfreq $rstfrq ;# how often to save restart files
DCDfile $dcdout ;# dcd file name
DCDfreq $dcdfrq ;# how often to save trajectories
outputEnergies $enefrq ;# how often to report energies to stdout
outputPressure $prsfrq ;# how often to report pressure to stdout
outputTiming $timfrq ;# how often to report CPU timings and wall clock timings
 
#Other settings
rigidBonds all ;# hold the water bonds fixed (SHAKE)
 
#Non-Bond Parameters
exclude scaled1-4
cutoff 12.
1-4scaling 1.0
switching on
switchdist 10.
pairlistdist 14.
margin .5
 
#PME (for full-system periodic electrostatics)
PME yes
PMEGridSpacing 1
 
# Multiple Time Step (rRESPA) Parameters
fullElectFrequency 3
nonbondedFreq 1

# Temperature Control
langevin on
langevinTemp $temp
langevinDamping 1
XSTfile $prfxout.xst
XSTfreq $xstfrq

# Constant Pressure Control (variable volume)
useGroupPressure yes ;# use pseudo-molecular virial (group H's). required by SHAKE
useFlexibleCell yes ;# constant ratio (x-y plane) and constant area (x-y) also available
useConstantRatio yes ;# constant x-y plane ratio
langevinPiston on
langevinPistonTarget $pres
langevinPistonPeriod 100.
langevinPistonDecay 50.
langevinPistonTemp $temp
 
numsteps $steps ;# see above
timestep $tstep ;# see above
stepspercycle $stepscycle ;# the number of steps per cycle
############################################################################

Am I varying the correct parameters when optimizing the timing of my simulations?
What are the optimal conditions when running a multicore workstation CPU/GPU simulation ?
Are the obvious flaws in my command line or parameters that are hurting the capabilities of NAMD?
Does the NAMD performance wiki mainly apply to CPU based simulations?

Thank you
James P

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:09 CST