Re: CUDA-accelarated NAMD does not use the video card?

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Fri Nov 13 2009 - 17:06:48 CST

NVIDIA's docs lied about the maximum 1D texture size, so what worked on
compute 1.3 devices (like my desktop where I tend to test things) returned
garbage on 1.0 and 1.1 (like your Quadro). Sorry.

This is fixed in the newer CVS builds and 2.7b2.

-Jim

On Fri, 13 Nov 2009, Lee Wei Yang wrote:

> Dear NAMD users,
>
> I was able to utilize the 3 Tesla GPU cards I have (on NVT ensemble with 42000+ atoms ) but not the 3 Tesla GPU cards+ 1 video card (Quadro FX 3700).
>> From the log it seems that the video card is at least twice slower than the Tesla card and cause the error "ERROR: Constraint failure in RATTLE algorithm for atom 5131!" I have two quad-core nehalem CPUs in the one machine with the 3 GPUs.
>
> I run it as
>
> /root/Software/NAMD_CVS_Linux-x86_64-CUDA/charmrun /root/Software/NAMD_CVS_Linux-x86_64-CUDA/namd2 ++local +idlepoll +p4 +devices 0,1,2,3 ./heat1.conf > log
>
> I attache the log in the bottom. Is this a known issue that NAMD does not use video card or what's the reason that it saw my video card but was not able to use it?
>
> Also, I tried to run 3CPU+3GPU for the benchmark Apoa1 and it gave this fatal error "FATAL ERROR: CUDA-accelerated NAMD does not support NBFIX terms in parameter file." what happened? any comment will help. thanks.
>
>
> Lee
>
> [lwy1_at_AMAX-mm1 Wat]# cat log
> Charm warning> Randomization of stack pointer is turned on in Kernel, run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it. Thread migration may not work!
> Charm++> cpu topology info is being gathered.
> Charm++> Running on 1 unique compute nodes (16-way SMP).
> Info: NAMD CVS for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60103 for net-linux-x86_64-iccstatic
> Info: Built Thu Nov 5 02:16:47 CST 2009 by jim on lisboa.ks.uiuc.edu
> Info: 1 NAMD CVS Linux-x86_64-CUDA 4 AMAX-mm1 root
> Info: Running on 4 processors.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00517607 s
> Pe 1 physical rank 1 binding to CUDA device 1 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
> Pe 3 physical rank 3 binding to CUDA device 3 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
> Pe 0 physical rank 0 binding to CUDA device 0 on AMAX-mm1: 'Tesla C1060' Mem: 4095MB Rev: 1.3
> Pe 2 physical rank 2 binding to CUDA device 2 on AMAX-mm1: 'Quadro FX 3700' Mem: 511MB Rev: 1.1
> Info: 1.63618 MB of memory in use based on CmiMemoryUsage
> Info: Changed directory to .
> Info: Configuration file is heat1.conf
> TCL: Suspending until startup complete.
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP 2
> Info: NUMBER OF STEPS 0
> Info: STEPS PER CYCLE 10
> Info: PERIODIC CELL BASIS 1 84.5 0 0
> Info: PERIODIC CELL BASIS 2 0 81.2 0
> Info: PERIODIC CELL BASIS 3 0 0 77
> Info: PERIODIC CELL CENTER 0 0 0
> Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
> Info: LOAD BALANCE STRATEGY New Load Balancers -- ASB
> Info: LDB PERIOD 2000 steps
> Info: FIRST LDB TIMESTEP 50
> Info: LAST LDB TIMESTEP -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: PME BACKGROUND SCALING 1
> Info: MAX SELF PARTITIONS 1
> Info: MAX PAIR PARTITIONS 1
> Info: SELF PARTITION ATOMS 154
> Info: SELF2 PARTITION ATOMS 154
> Info: PAIR PARTITION ATOMS 318
> Info: PAIR2 PARTITION ATOMS 637
> Info: MIN ATOMS PER PATCH 100
> Info: INITIAL TEMPERATURE 100
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC 1
> Info: EXCLUDE SCALED ONE-FOUR
> Info: 1-4 SCALE FACTOR 1
> Info: DCD FILENAME heating1.dcd
> Info: DCD FREQUENCY 250
> Info: DCD FIRST STEP 250
> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
> Info: XST FILENAME heating1.xst
> Info: XST FREQUENCY 250
> Info: NO VELOCITY DCD OUTPUT
> Info: OUTPUT FILENAME heating1
> Info: RESTART FILENAME heating1.restart
> Info: RESTART FREQUENCY 50
> Info: BINARY RESTART FILES WILL BE USED
> Info: SWITCHING ACTIVE
> Info: SWITCHING ON 18
> Info: SWITCHING OFF 20
> Info: PAIRLIST DISTANCE 22
> Info: PAIRLIST SHRINK RATE 0.01
> Info: PAIRLIST GROW RATE 0.01
> Info: PAIRLIST TRIGGER 0.3
> Info: PAIRLISTS PER CYCLE 2
> Info: PAIRLISTS ENABLED
> Info: MARGIN 0
> Info: HYDROGEN GROUP CUTOFF 2.5
> Info: PATCH DIMENSION 24.5
> Info: ENERGY OUTPUT STEPS 10
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: TIMING OUTPUT STEPS 250
> Info: PRESSURE OUTPUT STEPS 10
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE 100
> Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: PARTICLE MESH EWALD (PME) ACTIVE
> Info: PME TOLERANCE 1e-06
> Info: PME EWALD COEFFICIENT 0.150787
> Info: PME INTERPOLATION ORDER 4
> Info: PME GRID DIMENSIONS 64 64 64
> Info: PME MAXIMUM GRID SPACING 1.5
> Info: Attempting to read FFTW data from FFTW_NAMD_CVS_Linux-x86_64-CUDA.txt
> Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
> Info: Writing FFTW data to FFTW_NAMD_CVS_Linux-x86_64-CUDA.txt
> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
> Info: USING VERLET I (r-RESPA) MTS SCHEME.
> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
> Info: RIGID BONDS TO HYDROGEN : ALL
> Info: ERROR TOLERANCE : 1e-08
> Info: MAX ITERATIONS : 100
> Info: RIGID WATER USING SETTLE ALGORITHM
> Info: RANDOM NUMBER SEED 1258147825
> Info: USE HYDROGEN BONDS? NO
> Info: COORDINATE PDB ./min4.pdb
> Info: STRUCTURE FILE ./DOPC72_w10.psf
> Info: PARAMETER file: CHARMM format!
> Info: PARAMETERS ./par_all27_lipid.prm
> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
> Info: SUMMARY OF PARAMETERS:
> Info: 43 BONDS
> Info: 103 ANGLES
> Info: 115 DIHEDRAL
> Info: 3 IMPROPER
> Info: 0 CROSSTERM
> Info: 34 VDW
> Info: 0 VDW_PAIRS
> Info: TIME FOR READING PSF FILE: 0.195251
> Info: TIME FOR READING PDB FILE: 0.073735
> Info:
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 46437 ATOMS
> Info: 34198 BONDS
> Info: 31031 ANGLES
> Info: 26136 DIHEDRALS
> Info: 144 IMPROPERS
> Info: 0 CROSSTERMS
> Info: 0 EXCLUSIONS
> Info: 42549 RIGID BONDS
> Info: 96762 DEGREES OF FREEDOM
> Info: 16055 HYDROGEN GROUPS
> Info: TOTAL MASS = 275795 amu
> Info: TOTAL CHARGE = 9.38773e-06 e
> Info: MASS DENSITY = 0.866846 g/cm^3
> Info: ATOM DENSITY = 0.0878943 atoms/A^3
> Info: *****************************
> Info:
> Info: Entering startup at 8.58487 s, 13.3508 MB of memory in use
> Info: Startup phase 0 took 0.000151157 s, 13.3523 MB of memory in use
> Pairs: 0
> Info: Startup phase 1 took 0.05743 s, 22.3762 MB of memory in use
> Info: Startup phase 2 took 0.00048089 s, 22.7362 MB of memory in use
> Info: PATCH GRID IS 3 (PERIODIC) BY 3 (PERIODIC) BY 3 (PERIODIC)
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: REMOVING COM VELOCITY 0.00769141 -0.0136897 -0.0258227
> Info: LARGEST PATCH (13) HAS 2156 ATOMS
> Info: CREATING 572 COMPUTE OBJECTS
> Info: Startup phase 3 took 0.018611 s, 28.1172 MB of memory in use
> Info: PME using 4 and 4 processors for FFT and reciprocal sum.
> Info: PME GRID LOCATIONS: 0 1 2 3
> Info: PME TRANS LOCATIONS: 0 1 2 3
> Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
> Info: Startup phase 4 took 0.00072813 s, 28.6361 MB of memory in use
> Info: Startup phase 5 took 0.0132589 s, 25.0001 MB of memory in use
> LDB: Measuring processor speeds ... Done.
> Info: Startup phase 6 took 1.20039 s, 24.7086 MB of memory in use
> Info: CREATING 572 COMPUTE OBJECTS
> Info: useSync: 1 useProxySync: 0
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 833 POINTS
> create ComputeNonbondedCUDA
> Updated CUDA force table with 8192 elements.
> create ComputeNonbondedCUDA
> create ComputeNonbondedCUDA
> create ComputeNonbondedCUDA
> Pe 2 found 67 unique exclusion lists needing 244 bytes
> Pe 1 found 67 unique exclusion lists needing 244 bytes
> Pe 3 found 67 unique exclusion lists needing 244 bytes
> Pe 0 found 67 unique exclusion lists needing 244 bytes
> Info: Startup phase 7 took 0.0099709 s, 25.7216 MB of memory in use
> Info: Startup phase 8 took 9.10759e-05 s, 26.7311 MB of memory in use
> Info: Finished startup at 9.88598 s, 26.7311 MB of memory in use
>
> REINITIALIZING VELOCITIES AT STEP 0 TO 100 KELVIN.
> TCL: Setting parameter langevinTemp to 100
> TCL: Running for 1000 steps
> Pe 0 has 7 local and 15 remote patches and 104 local and 85 remote computes.
> Pe 3 has 7 local and 17 remote patches and 100 local and 89 remote computes.
> Pe 1 has 6 local and 12 remote patches and 90 local and 72 remote computes.
> Pe 2 has 7 local and 11 remote patches and 110 local and 79 remote computes.
> allocating 14 MB of memory on GPU
> allocating 12 MB of memory on GPU
> allocating 15 MB of memory on GPU
> allocating 14 MB of memory on GPU
> CUDA EVENT TIMING: 1 0.359904 27.775040 0.459232 51.373665 0.305536 80.273376
> CUDA TIMING: 88.939905 ms/step on node 1
> CUDA EVENT TIMING: 2 0.588992 66.039459 0.497664 129.468674 0.380032 196.974823
> CUDA TIMING: 201.452017 ms/step on node 2
> CUDA EVENT TIMING: 0 10.262560 33.177601 0.315072 58.162750 0.188608 102.106590
> CUDA TIMING: 103.115082 ms/step on node 0
> CUDA EVENT TIMING: 3 0.513920 36.210625 0.484352 95.579971 0.333920 133.122787
> CUDA TIMING: 144.111872 ms/step on node 3
> ERROR: Constraint failure in RATTLE algorithm for atom 5131!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 5246!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 5108!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 8558!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 5798!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 8460!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 330!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 942!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 195!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 4182!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 888!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 3774!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 140!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 155!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 3452!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 830!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 3495!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Constraint failure in RATTLE algorithm for atom 309!
> ERROR: Constraint failure; simulation has become unstable.
> ERROR: Exiting prematurely; see error messages above.
> ====================================================
>
> WallClock: 10.681122 CPUTime: 7.539854 Memory: 35.886200 MB
> [lwy1_at_AMAX-mm1 Wat]#
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:30 CST