RE: CUDA-accelarated NAMD does not use the video card?

From: Lee Wei Yang (lwyang_at_ljbi.org)
Date: Fri Nov 13 2009 - 18:15:27 CST

Dear all,

I think the newly released 2.7b2 has fixed the problem although 4CPUs+3GPU+1 video card is not anywhere faster than 3CPUs+3GPU. Maybe it is because GPU has to wait the slower video card ?

Lee
________________________________________
From: owner-namd-l_at_ks.uiuc.edu [owner-namd-l_at_ks.uiuc.edu] On Behalf Of Lee Wei Yang [lwyang_at_ljbi.org]
Sent: Friday, November 13, 2009 2:55 PM
To: NAMD
Subject: namd-l: CUDA-accelarated NAMD does not use the video card?

Dear NAMD users,

I was able to utilize the 3 Tesla GPU cards I have (on NVT ensemble with 42000+ atoms ) but not the 3 Tesla GPU cards+ 1 video card (Quadro FX 3700).
>From the log it seems that the video card is at least twice slower than the Tesla card and cause the error "ERROR: Constraint failure in RATTLE algorithm for atom 5131!" I have two quad-core nehalem CPUs in the one machine with the 3 GPUs.

I run it as

/root/Software/NAMD_CVS_Linux-x86_64-CUDA/charmrun /root/Software/NAMD_CVS_Linux-x86_64-CUDA/namd2 ++local +idlepoll +p4 +devices 0,1,2,3 ./heat1.conf > log

I attache the log in the bottom. Is this a known issue that NAMD does not use video card or what's the reason that it saw my video card but was not able to use it?

Also, I tried to run 3CPU+3GPU for the benchmark Apoa1 and it gave this fatal error "FATAL ERROR: CUDA-accelerated NAMD does not support NBFIX terms in parameter file." what happened? any comment will help. thanks.

Lee

[lwy1_at_AMAX-mm1 Wat]# cat log
Charm warning> Randomization of stack pointer is turned on in Kernel, run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it. Thread migration may not work!
Charm++> cpu topology info is being gathered.
Charm++> Running on 1 unique compute nodes (16-way SMP).
Info: NAMD CVS for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60103 for net-linux-x86_64-iccstatic
Info: Built Thu Nov 5 02:16:47 CST 2009 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS Linux-x86_64-CUDA 4 AMAX-mm1 root
Info: Running on 4 processors.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00517607 s
Pe 1 physical rank 1 binding to CUDA device 1 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
Pe 3 physical rank 3 binding to CUDA device 3 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
Pe 0 physical rank 0 binding to CUDA device 0 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
Pe 2 physical rank 2 binding to CUDA device 2 on AMAX-mm1: 'Quadro FX 3700' Mem: 511MB Rev: 1.1
Info: 1.63618 MB of memory in use based on CmiMemoryUsage
Info: Changed directory to .
Info: Configuration file is heat1.conf
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 2
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 10
Info: PERIODIC CELL BASIS 1 84.5 0 0
Info: PERIODIC CELL BASIS 2 0 81.2 0
Info: PERIODIC CELL BASIS 3 0 0 77
Info: PERIODIC CELL CENTER 0 0 0
Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
Info: LOAD BALANCE STRATEGY New Load Balancers -- ASB
Info: LDB PERIOD 2000 steps
Info: FIRST LDB TIMESTEP 50
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MAX SELF PARTITIONS 1
Info: MAX PAIR PARTITIONS 1
Info: SELF PARTITION ATOMS 154
Info: SELF2 PARTITION ATOMS 154
Info: PAIR PARTITION ATOMS 318
Info: PAIR2 PARTITION ATOMS 637
Info: MIN ATOMS PER PATCH 100
Info: INITIAL TEMPERATURE 100
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 SCALE FACTOR 1
Info: DCD FILENAME heating1.dcd
Info: DCD FREQUENCY 250
Info: DCD FIRST STEP 250
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: XST FILENAME heating1.xst
Info: XST FREQUENCY 250
Info: NO VELOCITY DCD OUTPUT
Info: OUTPUT FILENAME heating1
Info: RESTART FILENAME heating1.restart
Info: RESTART FREQUENCY 50
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 18
Info: SWITCHING OFF 20
Info: PAIRLIST DISTANCE 22
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 24.5
Info: ENERGY OUTPUT STEPS 10
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 250
Info: PRESSURE OUTPUT STEPS 10
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 100
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.150787
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 64 64 64
Info: PME MAXIMUM GRID SPACING 1.5
Info: Attempting to read FFTW data from FFTW_NAMD_CVS_Linux-x86_64-CUDA.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to FFTW_NAMD_CVS_Linux-x86_64-CUDA.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: RANDOM NUMBER SEED 1258147825
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB ./min4.pdb
Info: STRUCTURE FILE ./DOPC72_w10.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS ./par_all27_lipid.prm
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SUMMARY OF PARAMETERS:
Info: 43 BONDS
Info: 103 ANGLES
Info: 115 DIHEDRAL
Info: 3 IMPROPER
Info: 0 CROSSTERM
Info: 34 VDW
Info: 0 VDW_PAIRS
Info: TIME FOR READING PSF FILE: 0.195251
Info: TIME FOR READING PDB FILE: 0.073735
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 46437 ATOMS
Info: 34198 BONDS
Info: 31031 ANGLES
Info: 26136 DIHEDRALS
Info: 144 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 42549 RIGID BONDS
Info: 96762 DEGREES OF FREEDOM
Info: 16055 HYDROGEN GROUPS
Info: TOTAL MASS = 275795 amu
Info: TOTAL CHARGE = 9.38773e-06 e
Info: MASS DENSITY = 0.866846 g/cm^3
Info: ATOM DENSITY = 0.0878943 atoms/A^3
Info: *****************************
Info:
Info: Entering startup at 8.58487 s, 13.3508 MB of memory in use
Info: Startup phase 0 took 0.000151157 s, 13.3523 MB of memory in use
Pairs: 0
Info: Startup phase 1 took 0.05743 s, 22.3762 MB of memory in use
Info: Startup phase 2 took 0.00048089 s, 22.7362 MB of memory in use
Info: PATCH GRID IS 3 (PERIODIC) BY 3 (PERIODIC) BY 3 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY 0.00769141 -0.0136897 -0.0258227
Info: LARGEST PATCH (13) HAS 2156 ATOMS
Info: CREATING 572 COMPUTE OBJECTS
Info: Startup phase 3 took 0.018611 s, 28.1172 MB of memory in use
Info: PME using 4 and 4 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 0 1 2 3
Info: PME TRANS LOCATIONS: 0 1 2 3
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Startup phase 4 took 0.00072813 s, 28.6361 MB of memory in use
Info: Startup phase 5 took 0.0132589 s, 25.0001 MB of memory in use
LDB: Measuring processor speeds ... Done.
Info: Startup phase 6 took 1.20039 s, 24.7086 MB of memory in use
Info: CREATING 572 COMPUTE OBJECTS
Info: useSync: 1 useProxySync: 0
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 833 POINTS
create ComputeNonbondedCUDA
Updated CUDA force table with 8192 elements.
create ComputeNonbondedCUDA
create ComputeNonbondedCUDA
create ComputeNonbondedCUDA
Pe 2 found 67 unique exclusion lists needing 244 bytes
Pe 1 found 67 unique exclusion lists needing 244 bytes
Pe 3 found 67 unique exclusion lists needing 244 bytes
Pe 0 found 67 unique exclusion lists needing 244 bytes
Info: Startup phase 7 took 0.0099709 s, 25.7216 MB of memory in use
Info: Startup phase 8 took 9.10759e-05 s, 26.7311 MB of memory in use
Info: Finished startup at 9.88598 s, 26.7311 MB of memory in use

REINITIALIZING VELOCITIES AT STEP 0 TO 100 KELVIN.
TCL: Setting parameter langevinTemp to 100
TCL: Running for 1000 steps
Pe 0 has 7 local and 15 remote patches and 104 local and 85 remote computes.
Pe 3 has 7 local and 17 remote patches and 100 local and 89 remote computes.
Pe 1 has 6 local and 12 remote patches and 90 local and 72 remote computes.
Pe 2 has 7 local and 11 remote patches and 110 local and 79 remote computes.
allocating 14 MB of memory on GPU
allocating 12 MB of memory on GPU
allocating 15 MB of memory on GPU
allocating 14 MB of memory on GPU
CUDA EVENT TIMING: 1 0.359904 27.775040 0.459232 51.373665 0.305536 80.273376
CUDA TIMING: 88.939905 ms/step on node 1
CUDA EVENT TIMING: 2 0.588992 66.039459 0.497664 129.468674 0.380032 196.974823
CUDA TIMING: 201.452017 ms/step on node 2
CUDA EVENT TIMING: 0 10.262560 33.177601 0.315072 58.162750 0.188608 102.106590
CUDA TIMING: 103.115082 ms/step on node 0
CUDA EVENT TIMING: 3 0.513920 36.210625 0.484352 95.579971 0.333920 133.122787
CUDA TIMING: 144.111872 ms/step on node 3
ERROR: Constraint failure in RATTLE algorithm for atom 5131!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 5246!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 5108!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 8558!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 5798!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 8460!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 330!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 942!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 195!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 4182!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 888!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 3774!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 140!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 155!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 3452!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 830!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 3495!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 309!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Exiting prematurely; see error messages above.
====================================================

WallClock: 10.681122 CPUTime: 7.539854 Memory: 35.886200 MB
[lwy1_at_AMAX-mm1 Wat]#

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:30 CST