The biggest system simulated on one TESLA C2050 ?

From: Marek Maly (marek.maly_at_ujep.cz)
Date: Wed Mar 30 2011 - 13:30:11 CDT

Dear all,

I would like to know which is the biggest system (in terms of number of
atoms in box including the water one) which was ever simulated with cuda
implementation of NAMD
on one single GPU card - if possible TESLA C2050 (let say with cutoff 10
A) ?

I am asking because we failed to simulate on our GPU workstation equipped
with Tesla C2050 the system with cca 300 000 atoms, obtaining
this error.

//////////////////////////////////////////////////
Info: Updated CUDA LJ table with 7 x 7 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Found 100192 unique exclusion lists needing -237752280 bytes
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
////////////////////////////////////////////////////

The full record is below.

I have obtained the same error with CUDA version compilled from the
sources ( ./config Linux-x86_64-g++ --with-cuda ) as
well as with prebuilded version NAMD_2.8b1_Linux-x86_64-CUDA.tar.gz .

I wanted to eliminate the problem by compilation with memory optimisation
flag (./config Linux-x86_64-g++ --with-cuda --with-memopt ) but
when I added memory optimisation flag, compilation finished with error:

/////////////////////////////////////////////
g++ -m64 -I.rootdir/charm-6.3.0/net-linux-x86_64/include -DCMK_OPTIMIZE=1
-Isrc -Iinc -Iplugins/include -DSTATIC_PLUGIN -I.rootdir/tcl/include
-DNAMD_TCL -I.rootdir/fftw/include -DNAMD_FFTW -DNAMD_CUDA -I.
-I/usr/local/cuda/include -DMEM_OPT_VERSION -DNAMD_VERSION=\"2.8b1\"
-DNAMD_PLATFORM=\"Linux-x86_64-CUDA-memopt\"
-DREMOVE_PROXYRESULTMSG_EXTRACOPY -O3 -fexpensive-optimizations
-ffast-math -o obj/ComputeNonbondedCUDA.o -c src/ComputeNonbondedCUDA.C
In file included from src/ComputeNonbondedCUDA.C:13:
src/ComputeNonbondedCUDAKernel.h:7:17: warning: extra tokens at end of
#undef directive
src/ComputeNonbondedCUDA.C: In member function void
ComputeNonbondedCUDA::build_exclusions():
src/ComputeNonbondedCUDA.C:425: error: class Molecule has no member named
get_full_exclusions_for_atom
src/ComputeNonbondedCUDA.C: In member function virtual void
ComputeNonbondedCUDA::doWork():
src/ComputeNonbondedCUDA.C:764: warning: converting to int from double
src/ComputeNonbondedCUDA.C:879: warning: converting to int from double
src/ComputeNonbondedCUDA.C: In member function int
ComputeNonbondedCUDA::finishWork():
src/ComputeNonbondedCUDA.C:1303: error: class Molecule has no member named
get_full_exclusions_for_atom
make: *** [obj/ComputeNonbondedCUDA.o] Error 1
/////////////////////////////////////////////

although without the memory optimisation flag compilation finished without
any problems.
Just for the completeness, version of our g++ is "gcc version 4.1.2
20080704" and we
are using CentOS.

Thanks in advance for any relevant comments, suggestions !

Best wishes,

    Marek

THE FULL OUTPUT FROM OUR ATTEMPT TO SIMULATE CCA 300k atoms

[maly_at_pcm5227 big]$ $NAMD_GPU/namd2 +idlepoll +devices 1 INPUT
Charm++: standalone mode (not using charmrun)
Warning> Randomization of stack pointer is turned on in kernel, thread
migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space'
as root to disable it, or try run with '+isomalloc_sync'.
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (16-way SMP).
Charm++> cpu topology info is gathered in 5.004 seconds.
Info: NAMD 2.8b1 for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Mar 26 11:10:31 CDT 2011 by jim on larissa.ks.uiuc.edu
Info: 1 NAMD 2.8b1 Linux-x86_64-CUDA 1 pcm5227 maly
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 5.00608 s
Pe 0 physical rank 0 binding to CUDA device 1 on pcm5227: 'Tesla C2050'
Mem: 3071MB Rev: 2.0
Info: 1.49953 MB of memory in use based on CmiMemoryUsage
Info: Configuration file is INPUT
Info: Working in the current directory /home/maly/forMAREK/big
TCL: Suspending until startup complete.
Warning: The following variables were set in the
Warning: configuration file but will be ignored:
Warning: switchdist (switching)
Info: EXTENDED SYSTEM FILE output.12.xsc
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 0.75
Info: NUMBER OF STEPS 20000
Info: STEPS PER CYCLE 10
Info: PERIODIC CELL BASIS 1 231.096 0 0
Info: PERIODIC CELL BASIS 2 0 254.376 0
Info: PERIODIC CELL BASIS 3 0 0 253.047
Info: PERIODIC CELL CENTER 0 0 0
Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 2000 steps
Info: FIRST LDB TIMESTEP 50
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MIN ATOMS PER PATCH 40
Info: VELOCITY FILE output.12.vel
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME OUT_DCD
Info: DCD FREQUENCY 2000
Info: DCD FIRST STEP 2000
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: XST FILENAME OUTNAME.xst
Info: XST FREQUENCY 2000
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME OUTNAME
Info: BINARY OUTPUT FILES WILL BE USED
Info: RESTART FILENAME RESTART
Info: RESTART FREQUENCY 2000
Info: BINARY RESTART FILES WILL BE USED
Info: CUTOFF 10
Info: PAIRLIST DISTANCE 30
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0.975
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 33.475
Info: ENERGY OUTPUT STEPS 2000
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 20000
Info: PRESSURE OUTPUT STEPS 2000
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 300
Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
Info: TARGET PRESSURE IS 1.01325 BAR
Info: OSCILLATION PERIOD IS 100 FS
Info: DECAY TIME IS 50 FS
Info: PISTON TEMPERATURE IS 300 K
Info: PRESSURE CONTROL IS GROUP-BASED
Info: INITIAL STRAIN RATE IS -2.58418e-05 -2.58418e-05 -2.58418e-05
Info: CELL FLUCTUATION IS ISOTROPIC
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.312341
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 240 256 256
Info: PME MAXIMUM GRID SPACING 1
Info: Attempting to read FFTW data from
FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RANDOM NUMBER SEED 1301507526
Info: USE HYDROGEN BONDS? NO
Info: Using AMBER format force field!
Info: AMBER PARM FILE F11_vacnew.prmtop
Info: COORDINATE PDB F11_vacnew_amber.pdb
Info: Exclusions in PARM file will be ignored!
Info: SCNB (VDW SCALING) 2
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: BINARY COORDINATES output.12.coor
Reading parm file (F11_vacnew.prmtop) ...
PARM file in AMBER 7 format
Info: SUMMARY OF PARAMETERS:
Info: 10 BONDS
Info: 21 ANGLES
Info: 14 DIHEDRAL
Info: 0 IMPROPER
Info: 0 CROSSTERM
Info: 0 VDW
Info: 28 VDW_PAIRS
Info: TIME FOR READING PDB FILE: 0.552101
Info:
Info: Reading from binary file output.12.coor
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 309171 ATOMS
Info: 323502 BONDS
Info: 601959 ANGLES
Info: 966408 DIHEDRALS
Info: 0 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 927513 DEGREES OF FREEDOM
Info: 147417 HYDROGEN GROUPS
Info: 3 ATOMS IN LARGEST HYDROGEN GROUP
Info: 147417 MIGRATION GROUPS
Info: 3 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 2.06854e+06 amu
Info: TOTAL CHARGE = 0.000403497 e
Info: MASS DENSITY = 0.230915 g/cm^3
Info: ATOM DENSITY = 0.020784 atoms/A^3
Info: *****************************
Info:
Info: Entering startup at 8.71072 s, 103.392 MB of memory in use
Info: Startup phase 0 took 2.90871e-05 s, 103.392 MB of memory in use
Info: Startup phase 1 took 0.622876 s, 195.255 MB of memory in use
Info: Startup phase 2 took 0.00120592 s, 197.618 MB of memory in use
Info: Startup phase 3 took 1.90735e-05 s, 197.617 MB of memory in use
Info: PATCH GRID IS 6 (PERIODIC) BY 7 (PERIODIC) BY 7 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: Reading from binary file output.12.vel
Info: REMOVING COM VELOCITY 0.015786 -0.022846 0.0100032
Info: LARGEST PATCH (194) HAS 5836 ATOMS
Info: Startup phase 4 took 0.0868609 s, 242.218 MB of memory in use
Info: PME using 1 and 1 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 0
Info: PME TRANS LOCATIONS: 0
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Startup phase 5 took 0.0263231 s, 302.758 MB of memory in use
Info: Startup phase 6 took 6.69956e-05 s, 302.758 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 7 took 0.000169039 s, 302.76 MB of memory in use
Info: CREATING 5888 COMPUTE OBJECTS
Info: useSync: 1 useProxySync: 0
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 705 POINTS
Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
Info: Updated CUDA LJ table with 7 x 7 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Found 100192 unique exclusion lists needing -237752280 bytes
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
[0] Stack Traceback:
   [0:0] CmiAbort+0x7b [0xadcae5]
   [0:1] CmiOutOfMemory+0x66 [0xa0c1d8]
   [0:2] malloc+0x35 [0xa0c55f]
   [0:3] _Znwm+0x1d [0x322b8bd17d]
   [0:4] _Znam+0x9 [0x322b8bd299]
   [0:5] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee [0x6cc5a2]
   [0:6] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75 [0x6cacd7]
   [0:7] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6 [0x6cc902]
   [0:8] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119 [0x5be009]
   [0:9] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c [0x5c4fac]
   [0:10] _ZN4Node7startupEv+0x2d0 [0x8ef29e]
   [0:11] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x8eefca]
   [0:12] CkDeliverMessageFree+0x21 [0xa1a353]
   [0:13] _Z15_processHandlerPvP11CkCoreState+0x70b [0xa191fb]
   [0:14] CsdScheduleForever+0xa5 [0xae36ff]
   [0:15] CsdScheduler+0x1c [0xae3300]
   [0:16] _ZN7BackEnd7suspendEv+0xb [0x53eb65]
   [0:17] _ZN9ScriptTcl9initcheckEv+0x80 [0x969bac]
   [0:18] _ZN9ScriptTcl3runEv+0x66 [0x965734]
   [0:19] _Z18after_backend_initiPPc+0x4fd [0x53a515]
   [0:20] main+0x3a [0x539fe2]
   [0:21] __libc_start_main+0xf4 [0x3219a1d994]
   [0:22] _ZNSt8ios_base4InitD1Ev+0x6a [0x53599a]
Charm++ fatal error:
Could not malloc()--are we out of memory? (used: 326.885MB)
[0] Stack Traceback:
   [0:0] /opt/NAMD/NAMD_2.8b1_Source/NAMD_GPU/namd2 [0xadd6f5]
   [0:1] CmiAbort+0xb9 [0xadcb23]
   [0:2] CmiOutOfMemory+0x66 [0xa0c1d8]
   [0:3] malloc+0x35 [0xa0c55f]
   [0:4] _Znwm+0x1d [0x322b8bd17d]
   [0:5] _Znam+0x9 [0x322b8bd299]
   [0:6] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee [0x6cc5a2]
   [0:7] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75 [0x6cacd7]
   [0:8] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6 [0x6cc902]
   [0:9] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119 [0x5be009]
   [0:10] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c [0x5c4fac]
   [0:11] _ZN4Node7startupEv+0x2d0 [0x8ef29e]
   [0:12] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x8eefca]
   [0:13] CkDeliverMessageFree+0x21 [0xa1a353]
   [0:14] _Z15_processHandlerPvP11CkCoreState+0x70b [0xa191fb]
   [0:15] CsdScheduleForever+0xa5 [0xae36ff]
   [0:16] CsdScheduler+0x1c [0xae3300]
   [0:17] _ZN7BackEnd7suspendEv+0xb [0x53eb65]
   [0:18] _ZN9ScriptTcl9initcheckEv+0x80 [0x969bac]
   [0:19] _ZN9ScriptTcl3runEv+0x66 [0x965734]
   [0:20] _Z18after_backend_initiPPc+0x4fd [0x53a515]
   [0:21] main+0x3a [0x539fe2]
   [0:22] __libc_start_main+0xf4 [0x3219a1d994]
   [0:23] _ZNSt8ios_base4InitD1Ev+0x6a [0x53599a]
Aborted
[maly_at_pcm5227 big]$

-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:52 CST