CPU/GPU energy difference

From: Boonstra, S. (s.boonstra_at_rug.nl)
Date: Mon Feb 27 2017 - 05:08:28 CST

Dear NAMD-ers,

For a protein of about 7000 atoms, I am interested in the free energy
difference between two conformations. I use the confinement free energy
method (Cecchini, JPCB 113-19, 2009) to get an estimate of the entropy
difference and GPUs to speed up the sampling. The largest contributor to
the enthalpy difference is the potential energy difference between the two
reference structures after minimization.

After doing 20k steps of energy minimization of each state on only CPUs
(NAMD2.12b1_Linux-x86_64-multicore), I continued on GPUs (multicore-CUDA)
using the same input file and passing either binary or pdb coordinates for
the continuation. I found that for one of the states (A), the energy
changed by 400 kcal/mol, changing the transition from exo- to endotherm.

System | Energy CPU | GPU after CPU (kcal/mol)

State A -18839 -19403

State B -19210 -19200

deltaE -371 203

I use Amber ff14 with Generalized Born Implicit Solvation (sasa = off) and
the energy difference can be attributed to a large jump in the
electrostatic potential when using GPUs (see 3,4,5 below). It happens both
when I continue minimization or continue with a dynamics run after
minimization. When starting a new minimization entirely on GPUs, both the
electrostatic and VdW potential differ from the CPU in the first steps (see
1,2 below), and the resulting minimal energy after 40k steps is also
significantly different. (-17672--17453=-219 kcal/mol)

I have seen this on different clusters, using Tesla K40m, Titan Black or a
Quadro K620 on my workstation. I have found some clues in earlier threads
about forces being evaluated in single precision on GPUs and trajectories
diverging from between CPU and GPU (see below), but from that I would not
expect such a big difference in energy between identical structures.

Because this involves unpublished results and structures, I am hesitant to
put all the in- and output files publicly online. I do have them readily
available on request. The configuration file is included below.

Is this expected behaviour or am I doing something wrong? Does anyone have
suggestions on how I should proceed to reliably calculate the potential
energy difference between two (deeply) minimized structures?

I appreciate your help.

Cheers,

Sander

Graph of a short test run: https://drive.google.com/file/d/
0B3C0R46v85EjbEpIa3pLUVlIQVE/view?usp=sharing

1. CPU minimization start:
ENERGY: 0 547.7793 3927.2209 5592.3175
0.0000 *-22071.8261* *18275.6645* 0.0000
0.0000 0.0000 * 6271.1560* 0.0000
6271.1560 6271.1560 0.0000

2. GPU minimization start:
ENERGY: 0 547.7793 3927.2209 5592.3175
0.0000 *-21803.2737* *15692.8708* 0.0000
0.0000 0.0000 * 3956.9148* 0.0000
3956.9148 3956.9148 0.0000

3. CPU minimization end:
ENERGY: 1000 254.0078 882.3606 5251.7153
0.0000 *-22928.0693* -2336.4909 0.0000
0.0000 0.0000 *-18876.4764* 0.0000
-18876.4764 -18876.4764 0.0000

4. CPU minimization start of continuation:
ENERGY: 0 254.0078 882.3606 5251.7153
0.0000 *-22928.0691* -2336.4909 0.0000
0.0000 0.0000 *-18876.4763* 0.0000
-18876.4763 -18876.4763 0.0000

5. GPU minimization start of continuation:
ENERGY: 0 254.0078 882.3606 5251.7153
0.0000 *-23248.8540* -2336.2673 0.0000
0.0000 0.0000 *-19197.0376* 0.0000
-19197.0376 -19197.0376 0.0000

“Forces evaluated on the GPU differ slightly from a CPU-only calculation,
an effect more visible in reported scalar pressure values than in energies.
” (http://www.ks.uiuc.edu/Research/namd/2.9/ug/node88.html)

“Since you'll be doing floating point math and summing up numbers of
different magnitude in different order, trajectories will diverge
eventually. ”

Re: CPU GPU comparison (http://www.ks.uiuc.edu/Research/namd/mailing_list/
namd-l.2009-2010/3402.html)

“No surprise that GPU runs, particularly of very large systems, don't

conserve energy as well as CPU runs. On the GPU you compute forces in

single precision an thus have more "noise" in your system.”

Re: AW: Consistent temperature increase in CUDA runs

(http://www.ks.uiuc.edu/Research/namd/mailing_list/
namd-l.2011-2012/2669.html)

#############################################################

## JOB DESCRIPTION ##

## AMBER parameters from ##

## http://ambermd.org/namd/namd_amber.html ##

#############################################################

set system mySystem

set num1 ""

set num2 ""

set files [llength [glob -nocomplain *.dcd]]

if {$files > 0} {

               set num2 [expr $files+1]

               if {$files > 1} {set num1 $files}

       }

set previous ${system}_em${num1}

set current ${system}_em${num2}

amber on

parmfile ${system}.prmtop

#ambercoor ${system}.inpcrd

coordinates ./${system}_em.coor ;#mandatory, but ignored when
bincoordinates present

bincoordinates ./$previous.coor

set temperature 300

set outputname $current

binaryoutput yes

firsttimestep 0

#############################################################

## SIMULATION PARAMETERS ##

#############################################################

# Input

paraTypeCharmm on

temperature $temperature

# Implicit Solvent

gbis on

alphaCutoff 12.0

ionConcentration 0.15

# Force-Field Parameters

exclude scaled1-4

1-4scaling 0.833333333

scnb 2.0

readexclusions yes

cutoff 14.0

switching off

#switchdist 13.0

pairlistdist 16.0

# Integrator Parameters

timestep 2.0 ;# 2fs/step

rigidBonds all ;# needed for 2fs steps

rigidTolerance 1.0e-8

rigidIterations 100

nonbondedFreq 1

fullElectFrequency 1

stepspercycle 10

# Output

outputName $outputname

restartfreq 500000 ;# 500steps = every 1ns

dcdfreq 10

xstFreq 500

outputEnergies 500

outputPressure 0

#############################################################

## EXECUTION SCRIPT ##

#############################################################

# Minimization

minTinyStep 1.0e-7

minBabyStep 1.0e-3

minimization on

minimize 20000

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:06 CST