Moving atoms with bad contacts followed by CUDA error

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Nov 08 2012 - 09:45:00 CST

Hi:
Please forget about my previous mail about CUDA error. Now the
information is more clear cut: WDW problems attended by cuda error.

Still at Linux CUDA version 2.9, I prepared new psf/pdb for this heme
protein in a water box, neutralized and NaCl 0.15M. I had now first
eliminated a H2O molecule that I thought could give rise to high VDW
energies.

Minimization without any restraint on any atom with ts=1fs, margin=3,
outputEnergeis=10 went on for all 10,000 steps, however complaining
that energies were raising too much.

New minimization with ts=0.1fs, margin=15, outputEnergies=10,000
started by moving atoms with bad contacts but soon stopped to do so
because of CUDA error. THIS IS THE VERY REASON TO POST AGAIN. In the
past I already had many cases of relocation of dozens of atoms with
bad contacts, while the minimizer was able to put the system in order.
Perhaps, however, it was non-cuda version. NOTICE THAT VARIOUS TESTS
DID NOT REVEAL ANY SERIOUS CLASHES WITH THIS SYSTEM.

With previous preparation of this system (the only difference was to
retain all crystallization water, while now I have eliminated one H2O,
as said above), minimization was OK, while heating was attended by
high-energy/cuda error.

Below the log:

Running command: namd2 min-02.conf +p6 +idlepoll

Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD CVS-2012-09-26 for Linux-x86_64-multicore-CUDA
.............
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Wed Sep 26 02:25:08 CDT 2012 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2012-09-26 Linux-x86_64-multicore-CUDA 6 gig64 francesco
Info: Running on 6 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.010206 s
Pe 3 physical rank 3 will use CUDA device of pe 4
Pe 5 physical rank 5 will use CUDA device of pe 4
Pe 1 physical rank 1 will use CUDA device of pe 2
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
680' Mem: 2047MB Rev: 3.0
Pe 4 physical rank 4 binding to CUDA device 1 on gig64: 'GeForce GTX
680' Mem: 2047MB Rev: 3.0
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 will use CUDA device of pe 2
Info: 8.36328 MB of memory in use based on /proc/self/stat
Info: Configuration file is min-02.conf
Info: Working in the current directory /home/francesco/work_heme-oxygenase/MD
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 0.1
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 10
Info: PERIODIC CELL BASIS 1 81.8 0 0
Info: PERIODIC CELL BASIS 2 0 78.64 0
Info: PERIODIC CELL BASIS 3 0 0 82.31
Info: PERIODIC CELL CENTER -100.628 -13.9086 -83.6853
Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
Info: WRAPPING TO IMAGE NEAREST TO PERIODIC CELL CENTER.
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 2000 steps
Info: FIRST LDB TIMESTEP 50
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MIN ATOMS PER PATCH 40
Info: INITIAL TEMPERATURE 0
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: NO DCD TRAJECTORY OUTPUT
Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME ./min-02
Info: RESTART FILENAME ./min-02.restart
Info: RESTART FREQUENCY 100
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 10
Info: SWITCHING OFF 12
Info: PAIRLIST DISTANCE 13.5
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 15
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 31
Info: ENERGY OUTPUT STEPS 10000
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 100000
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 0
Info: LANGEVIN USING BBK INTEGRATOR
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.257952
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 90 90 90
Info: PME MAXIMUM GRID SPACING 1
Info: Attempting to read FFTW data from
FFTW_NAMD_CVS-2012-09-26_Linux-x86_64-multicore-CUDA.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to
FFTW_NAMD_CVS-2012-09-26_Linux-x86_64-multicore-CUDA.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 5
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : WATER
Info: ERROR TOLERANCE : 1e-06
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: RANDOM NUMBER SEED 12371
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB ./4G7L_heme_O2free_WTS_WTBOX_ION.pdb
Info: STRUCTURE FILE ./4G7L_heme_O2free_WTS_WTBOX_ION.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS ./par_all27_prot_lipid.prm
Info: PARAMETERS ./toppar_all22_prot_heme.str
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SKIPPING rtf SECTION IN STREAM FILE
Info: SUMMARY OF PARAMETERS:
Info: 185 BONDS
Info: 467 ANGLES
Info: 601 DIHEDRAL
Info: 47 IMPROPER
Info: 6 CROSSTERM
Info: 121 VDW
Info: 0 VDW_PAIRS
Info: 0 NBTHOLE_PAIRS
Info: TIME FOR READING PSF FILE: 0.233919
Info: TIME FOR READING PDB FILE: 0.0665121
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 49683 ATOMS
Info: 34285 BONDS
Info: 21814 ANGLES
Info: 9486 DIHEDRALS
Info: 641 IMPROPERS
Info: 211 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 46068 RIGID BONDS
Info: 102981 DEGREES OF FREEDOM
Info: 17231 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 17231 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 304545 amu
Info: TOTAL CHARGE = 2.58163e-06 e
Info: MASS DENSITY = 0.955127 g/cm^3
Info: ATOM DENSITY = 0.0938336 atoms/A^3
Info: *****************************
Info:
Info: Entering startup at 0.380281 s, 27.8633 MB of memory in use
Info: Startup phase 0 took 0.00011301 s, 27.9531 MB of memory in use
Info: ADDED 65395 IMPLICIT EXCLUSIONS
Info: Startup phase 1 took 0.02086 s, 36.7266 MB of memory in use
Info: Startup phase 2 took 8.70228e-05 s, 36.7383 MB of memory in use
Info: Startup phase 3 took 5.19753e-05 s, 36.7383 MB of memory in use
Info: Startup phase 4 took 0.000784159 s, 39.1328 MB of memory in use
Info: Startup phase 5 took 5.6982e-05 s, 39.3281 MB of memory in use
Info: PATCH GRID IS 2 (PERIODIC) BY 2 (PERIODIC) BY 2 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY 0 0 0
Info: LARGEST PATCH (4) HAS 6301 ATOMS
Info: Startup phase 6 took 0.0113049 s, 48.5742 MB of memory in use
Info: PME using 6 and 6 processors for FFT and reciprocal sum.
Info: PME USING 1 GRID NODES AND 1 TRANS NODES
Info: PME GRID LOCATIONS: 0 1 2 3 4 5
Info: PME TRANS LOCATIONS: 0 1 2 3 4 5
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Startup phase 7 took 0.00144601 s, 50.9375 MB of memory in use
Info: Startup phase 8 took 0.000285149 s, 51.1875 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 9 took 0.000597954 s, 51.2227 MB of memory in use
Info: CREATING 210 COMPUTE OBJECTS
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000324844 AT 11.9556
Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
Pe 2 hosts 1 local and 1 remote patches for pe 2
Pe 0 hosts 0 local and 1 remote patches for pe 2
Pe 3 hosts 1 local and 1 remote patches for pe 2
Pe 1 hosts 1 local and 0 remote patches for pe 2
Pe 5 hosts 1 local and 0 remote patches for pe 2
Pe 3 hosts 1 local and 1 remote patches for pe 4
Pe 0 hosts 0 local and 1 remote patches for pe 4
Pe 5 hosts 1 local and 0 remote patches for pe 4
Pe 1 hosts 1 local and 0 remote patches for pe 4
Pe 2 hosts 1 local and 1 remote patches for pe 4
Pe 4 hosts 0 local and 1 remote patches for pe 4
Pe 4 hosts 0 local and 1 remote patches for pe 2
Info: useSync: 1 useProxySync: 0
Info: Startup phase 10 took 0.11604 s, 92.1406 MB of memory in use
Info: Startup phase 11 took 8.29697e-05 s, 92.2812 MB of memory in use
Info: Startup phase 12 took 5.50747e-05 s, 92.2812 MB of memory in use
Info: Finished startup at 0.532046 s, 92.2891 MB of memory in use

TCL: Minimizing for 10000 steps
Pe 2 has 4 local and 4 remote patches and 54 local and 54 remote computes.
Pe 4 has 4 local and 4 remote patches and 54 local and 54 remote computes.
ETITLE: TS BOND ANGLE DIHED
IMPRP ELECT VDW BOUNDARY MISC
       KINETIC TOTAL TEMP POTENTIAL
  TOTAL3 TEMPAVG PRESSURE GPRESSURE
VOLUME PRESSAVG GPRESSAVG

ENERGY: 0 91552.8951 13357.2602 696.2789
619.3638 -161092.3538 3090319.8563 0.0000
0.0000 0.0000 3035453.3005 0.0000
3035453.3005 3035453.3005 0.0000 1209906.1486
1224187.1900 529479.8171 1209906.1486 1224187.1900

MINIMIZER SLOWLY MOVING 173 ATOMS WITH BAD CONTACTS DOWNHILL
ENERGY: 1 91713.1325 13354.0057 696.0843
619.2720 -28734.4535 28182124.8631 0.0000
0.0000 0.0000 28259772.9041 0.0000
28259772.9041 28259772.9041 0.0000 -1481223.6909
-1465406.2454 529479.8171 -1481223.6909 -1465406.2454

MINIMIZER SLOWLY MOVING 131 ATOMS WITH BAD CONTACTS DOWNHILL
ENERGY: 2 91937.8080 13372.9520 695.8552
619.3279 -44353.3540 4618829.7986 0.0000
0.0000 0.0000 4681102.3876 0.0000
4681102.3876 4681102.3876 0.0000 -272706.5356
-255252.9501 529479.8171 -272706.5356 -255252.9501

MINIMIZER SLOWLY MOVING 82 ATOMS WITH BAD CONTACTS DOWNHILL
ENERGY: 3 92203.1494 13413.8671 695.8489
619.1154 -47093.3155 1260183.8735 0.0000
0.0000 0.0000 1320022.5388 0.0000
1320022.5388 1320022.5388 0.0000 -92382.3634
-75053.0063 529479.8171 -92382.3634 -75053.0063

MINIMIZER SLOWLY MOVING 75 ATOMS WITH BAD CONTACTS DOWNHILL
FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 2 (gig64
device 0): unspecified launch failure
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 2
(gig64 device 0): unspecified launch failure

Charm++ fatal error:
FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 2 (gig64
device 0): unspecified launch failure

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:44 CST