From: Jianing Song (sjn_sk_at_hotmail.com)
Date: Sun Oct 10 2010 - 09:43:58 CDT
Dear all,
Recently, I've used NAMD-2.7b3-CUDA (CHARMM force field) of simulating two membrane systems to test capacity of NAMD with GPU. 
Here one of the structures is from the membrane-protein-tutorial of NAMD website containing 50,000 atoms(PDB ID: 1K4C, system1).
And the other one belongs to GPCR containing 110,000 atoms(PDB ID:2rh1, system2). 
Additionally, I've used non-GPU version NAMD (2.7b3) to compare the capacity of acceleration between GPU and multi-CPUs version.
Finally, the results are as follows:
        
        
        
        
        
        
        
        
                
                        
                        Comparison of GPU and CPU 
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        system1
                        MD timeٍ10psŁ©
                        Number of atoms(50,000)
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        
                        
                        CPUTime/s
                        
                        
                        CPUTime/s
                        
                
                
                        
                        1CPU+GPU
                        719.888777
                        
                        1CPU
                        4204.590820
                        
                
                
                        
                        2CPU+GPU
                        495.532307
                        
                        2CPU
                        2039.238030
                        
                
                
                        
                        4CPU+GPU
                        328.818407
                        
                        4CPU
                        1075.074580
                        
                
                
                        
                        8CPU+GPU
                        341.784241
                        
                        8CPU
                        583.565300
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        system2
                        MD timeٍ10psŁ©
                        Number of atoms(110,000)
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
                
                        
                        
                        CPUTime/s
                        
                        
                        CPUTime/s
                        
                
                
                        
                        1CPU+GPU
                        1549.540405
                        
                        1CPU
                        10560.066406
                        
                
                
                        
                        2CPU+GPU
                        1093.049805
                        
                        2CPU
                        5303.254883
                        
                
                
                        
                        4CPU+GPU
                        807.013306
                        
                        4CPU
                        2698.643799
                        
                
                
                        
                        8CPU+GPU
                        763.851868
                         
                        8CPU
                        1352.766357
                        
                
                
                        
                        
                        
                        
                        
                        
                        
                
        
 
It seems that GPU version NAMD-CUDA could only show limited  accelerate capacity, 
even worse, in all the simulation processes, the utilization of GPU is only 50%-60%.
So, is this the normal performance of the GPU-version of NAMD? 
Or is there something wrong with my job? If so, how can I improve the performance of my job?
P. S. 
Here is my input file:
#############################################################
   ## ADJUSTABLE PARAMETERS                                   ##
#############################################################
structure          ../../kcsa_popcwi.psf
coordinates        ../../kcsa_popcwi.pdb
outputName         kcsa_popcwieq-05_8cpu
set temperature    310
# Continuing a job from the restart files
if {1} {
set inputname      kcsa_popcwieq-04
binCoordinates     ../../step4/$inputname.restart.coor
binVelocities      ../../step4/$inputname.restart.vel  ;# remove the "temperature" entry if you use this!
extendedSystem     ../../step4/$inputname.restart.xsc
}
firsttimestep      752000
#############################################################
## SIMULATION PARAMETERS                                   ##
#############################################################
# Input
paraTypeCharmm      on
parameters          ../../par_all27_prot_lipidNBFIX.prm
if {0} {
cellBasisVector1    98.    0.   0.
cellBasisVector2     0.   98.   0.
cellBasisVector3     0.    0.  96.
cellOrigin          -0.0390621498227 -0.0503903478384 0.05063835904
}
wrapWater           on
wrapAll             on
# Force-Field Parameters
exclude             scaled1-4
1-4scaling          1.0
cutoff              12.
switching           on
switchdist          10.
pairlistdist        13.5
# Integrator Parameters
timestep            2.0  ;# 2fs/step
rigidBonds          all  ;# needed for 2fs steps
nonbondedFreq       1
fullElectFrequency  2
stepspercycle       20
#PME (for full-system periodic electrostatics)
if {1} {
PME                 yes
PMEGridSizeX       100
PMEGridSizeY       100
PMEGridSizeZ       90
}
# Constant Temperature Control
langevin            on    ;# do langevin dynamics
langevinDamping     1     ;# damping coefficient (gamma) of 5/ps
langevinTemp        $temperature
# Constant Pressure Control (variable volume)
if {1} {
useGroupPressure      yes ;# needed for 2fs steps
#useFlexibleCell       yes  ;# no for water box, yes for membrane
#useConstantArea       yes  ;# no for water box, yes for membrane
langevinPiston        on
langevinPistonTarget  1.01325 ;#  in bar -> 1 atm
langevinPistonPeriod  200.
langevinPistonDecay   50.
langevinPistonTemp    $temperature
}
restartfreq        1000     ;# 1000steps = every 2ps
dcdfreq            1000
xstFreq            1000
outputEnergies     100
outputPressure      50
# Fixed Atoms Constraint (set PDB beta-column to 1)
if {0} {
fixedAtoms          on
fixedAtomsFile      nottails.fix.pdb
fixedAtomsCol       B
fixedAtomsForces    on
}
#############################################################
## EXECUTION SCRIPT                                        ##
#############################################################
# Minimization
if {0} {
minimize            1000
reinitvels          $temperature
}
run 5000 ;# 10 ps
                                               
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:35 CST