Comparision of GPU and multi-CPUs version of NAMD

From: Jianing Song (sjn_sk_at_hotmail.com)
Date: Sun Oct 10 2010 - 09:51:54 CDT

Dear all,

Recently, I've used NAMD-2.7b3-CUDA (CHARMM force field) of simulating two membrane systems to test capacity of NAMD with GPU.
Here one of the structures is from the membrane-protein-tutorial of NAMD website containing 50,000 atoms(PDB ID: 1K4C, system1).
And the other one belongs to GPCR containing 110,000 atoms(PDB ID:2rh1, system2).
Additionally, I've used non-GPU version NAMD (2.7b3) to compare the capacity of acceleration between GPU and multi-CPUs version.

Finally, the results are as follows:

        
        
        
        
        
        

        
        
                
                        

                        Comparison of GPU and CPU
                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        system1

                        MD timeŁ¨10psŁ©
                        Number of atoms(50,000)
                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        

                        

                        CPUTime/s
                        

                        

                        CPUTime/s
                        

                
                
                        

                        1CPU+GPU
                        719.888777
                        

                        1CPU
                        4204.590820
                        

                
                
                        

                        2CPU+GPU
                        495.532307
                        

                        2CPU
                        2039.238030
                        

                
                
                        

                        4CPU+GPU
                        328.818407
                        

                        4CPU
                        1075.074580
                        

                
                
                        

                        8CPU+GPU
                        341.784241
                        

                        8CPU
                        583.565300
                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        system2

                        MD timeŁ¨10psŁ©
                        Number of atoms(110,000)
                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
                
                        

                        

                        CPUTime/s
                        

                        

                        CPUTime/s
                        

                
                
                        

                        1CPU+GPU
                        1549.540405
                        

                        1CPU
                        10560.066406
                        

                
                
                        

                        2CPU+GPU
                        1093.049805
                        

                        2CPU
                        5303.254883
                        

                
                
                        

                        4CPU+GPU
                        807.013306
                        

                        4CPU
                        2698.643799
                        

                
                
                        

                        8CPU+GPU
                        763.851868
                         

                        8CPU
                        1352.766357
                        

                
                
                        

                        

                        

                        

                        

                        

                        

                
        
 

It seems that GPU version NAMD-CUDA could only show limited accelerate capacity,
even worse, in all the simulation processes, the utilization of GPU is only 50%-60%.

So, is this the normal performance of the GPU-version of NAMD?
Or is there something wrong with my job? If so, how can I improve the performance of my job?

P. S.
Here is my input file:
#############################################################
   ## ADJUSTABLE PARAMETERS ##
#############################################################

structure ../../kcsa_popcwi.psf
coordinates ../../kcsa_popcwi.pdb
outputName kcsa_popcwieq-05_8cpu

set temperature 310

# Continuing a job from the restart files
if {1} {
set inputname kcsa_popcwieq-04
binCoordinates ../../step4/$inputname.restart.coor
binVelocities ../../step4/$inputname.restart.vel ;# remove the "temperature" entry if you use this!
extendedSystem ../../step4/$inputname.restart.xsc
}

firsttimestep 752000

#############################################################
## SIMULATION PARAMETERS ##
#############################################################

# Input
paraTypeCharmm on
parameters ../../par_all27_prot_lipidNBFIX.prm
if {0} {
cellBasisVector1 98. 0. 0.
cellBasisVector2 0. 98. 0.
cellBasisVector3 0. 0. 96.
cellOrigin -0.0390621498227 -0.0503903478384 0.05063835904
}
wrapWater on
wrapAll on
# Force-Field Parameters
exclude scaled1-4
1-4scaling 1.0
cutoff 12.
switching on
switchdist 10.
pairlistdist 13.5

# Integrator Parameters
timestep 2.0 ;# 2fs/step
rigidBonds all ;# needed for 2fs steps
nonbondedFreq 1
fullElectFrequency 2
stepspercycle 20

#PME (for full-system periodic electrostatics)
if {1} {
PME yes
PMEGridSizeX 100
PMEGridSizeY 100
PMEGridSizeZ 90
}

# Constant Temperature Control
langevin on ;# do langevin dynamics
langevinDamping 1 ;# damping coefficient (gamma) of 5/ps
langevinTemp $temperature

# Constant Pressure Control (variable volume)
if {1} {
useGroupPressure yes ;# needed for 2fs steps
#useFlexibleCell yes ;# no for water box, yes for membrane
#useConstantArea yes ;# no for water box, yes for membrane

langevinPiston on
langevinPistonTarget 1.01325 ;# in bar -> 1 atm
langevinPistonPeriod 200.
langevinPistonDecay 50.
langevinPistonTemp $temperature

}

restartfreq 1000 ;# 1000steps = every 2ps
dcdfreq 1000
xstFreq 1000
outputEnergies 100
outputPressure 50

# Fixed Atoms Constraint (set PDB beta-column to 1)
if {0} {
fixedAtoms on
fixedAtomsFile nottails.fix.pdb
fixedAtomsCol B
fixedAtomsForces on
}

#############################################################
## EXECUTION SCRIPT ##
#############################################################

# Minimization
if {0} {
minimize 1000
reinitvels $temperature
}

run 5000 ;# 10 ps

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:11 CST