Re: The biggest system simulated on one TESLA C2050 ?

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Mar 30 2011 - 14:42:56 CDT

dear marek,

the error message rings a bell, but i don't remember the details.
please check the mailing list archive. i am currently running some
benchmark tests with 224,000 water molecules, i.e. 672000 atoms
and a cutoff of 14 angstrom on a single and multiple C2050s.

i think there is some limitation in the NAMD CUDA code that
will boost performance and you can avoid this by some
reordering of your input or similar.

apropos tesla C2050:
if you are not running it in a desktop and are getting worried about
the temperature, you might want to check out this hack. our
C2050s don't go beyond 60C even under full load and with
four of them side-by-side in the case.
http://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness

cheers,
    axel.

2011/3/30 Marek Maly <marek.maly_at_ujep.cz>:
> Dear all,
>
> I would like to know which is the biggest system (in terms of number of
> atoms in box including the water one) which was ever simulated with cuda
> implementation of NAMD
> on one single GPU card - if possible TESLA C2050 (let say with cutoff 10 A)
> ?
>
> I am asking because we failed to simulate on our GPU workstation equipped
> with Tesla C2050 the system with cca 300 000 atoms, obtaining
> this error.
>
> //////////////////////////////////////////////////
> Info: Updated CUDA LJ table with 7 x 7 elements.
> Info: Updated CUDA force table with 4096 elements.
> Info: Found 100192 unique exclusion lists needing -237752280 bytes
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
> ////////////////////////////////////////////////////
>
> The full record is below.
>
> I have obtained the same error with CUDA version compilled from the sources
> ( ./config Linux-x86_64-g++ --with-cuda  ) as
> well as with prebuilded version NAMD_2.8b1_Linux-x86_64-CUDA.tar.gz .
>
> I wanted to eliminate the problem by compilation with memory optimisation
> flag (./config Linux-x86_64-g++ --with-cuda --with-memopt ) but
> when I added memory optimisation flag, compilation finished with error:
>
> /////////////////////////////////////////////
> g++ -m64 -I.rootdir/charm-6.3.0/net-linux-x86_64/include -DCMK_OPTIMIZE=1
> -Isrc -Iinc   -Iplugins/include -DSTATIC_PLUGIN -I.rootdir/tcl/include
> -DNAMD_TCL -I.rootdir/fftw/include -DNAMD_FFTW -DNAMD_CUDA -I.
> -I/usr/local/cuda/include -DMEM_OPT_VERSION  -DNAMD_VERSION=\"2.8b1\"
> -DNAMD_PLATFORM=\"Linux-x86_64-CUDA-memopt\"
>  -DREMOVE_PROXYRESULTMSG_EXTRACOPY   -O3 -fexpensive-optimizations
> -ffast-math  -o obj/ComputeNonbondedCUDA.o -c src/ComputeNonbondedCUDA.C
> In file included from src/ComputeNonbondedCUDA.C:13:
> src/ComputeNonbondedCUDAKernel.h:7:17: warning: extra tokens at end of
> #undef directive
> src/ComputeNonbondedCUDA.C: In member function void
> ComputeNonbondedCUDA::build_exclusions():
> src/ComputeNonbondedCUDA.C:425: error: class Molecule has no member named
> get_full_exclusions_for_atom
> src/ComputeNonbondedCUDA.C: In member function virtual void
> ComputeNonbondedCUDA::doWork():
> src/ComputeNonbondedCUDA.C:764: warning: converting to int from double
> src/ComputeNonbondedCUDA.C:879: warning: converting to int from double
> src/ComputeNonbondedCUDA.C: In member function int
> ComputeNonbondedCUDA::finishWork():
> src/ComputeNonbondedCUDA.C:1303: error: class Molecule has no member named
> get_full_exclusions_for_atom
> make: *** [obj/ComputeNonbondedCUDA.o] Error 1
> /////////////////////////////////////////////
>
> although without the memory optimisation flag compilation finished without
> any problems.
> Just for the completeness, version of our g++ is "gcc version 4.1.2
> 20080704" and we
> are using CentOS.
>
> Thanks in advance for any relevant comments, suggestions !
>
> Best wishes,
>
>   Marek
>
>
>
>
>
> THE FULL OUTPUT FROM OUR ATTEMPT TO SIMULATE CCA 300k atoms
>
>
> [maly_at_pcm5227 big]$  $NAMD_GPU/namd2  +idlepoll +devices 1  INPUT
> Charm++: standalone mode (not using charmrun)
> Warning> Randomization of stack pointer is turned on in kernel, thread
> migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space'
> as root to disable it, or try run with '+isomalloc_sync'.
> Charm++> scheduler running in netpoll mode.
> Charm++> Running on 1 unique compute nodes (16-way SMP).
> Charm++> cpu topology info is gathered in 5.004 seconds.
> Info: NAMD 2.8b1 for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Mar 26 11:10:31 CDT 2011 by jim on larissa.ks.uiuc.edu
> Info: 1 NAMD  2.8b1  Linux-x86_64-CUDA  1    pcm5227  maly
> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 5.00608 s
> Pe 0 physical rank 0 binding to CUDA device 1 on pcm5227: 'Tesla C2050'
>  Mem: 3071MB  Rev: 2.0
> Info: 1.49953 MB of memory in use based on CmiMemoryUsage
> Info: Configuration file is INPUT
> Info: Working in the current directory /home/maly/forMAREK/big
> TCL: Suspending until startup complete.
> Warning: The following variables were set in the
> Warning: configuration file but will be ignored:
> Warning:    switchdist (switching)
> Info: EXTENDED SYSTEM FILE   output.12.xsc
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP               0.75
> Info: NUMBER OF STEPS        20000
> Info: STEPS PER CYCLE        10
> Info: PERIODIC CELL BASIS 1  231.096 0 0
> Info: PERIODIC CELL BASIS 2  0 254.376 0
> Info: PERIODIC CELL BASIS 3  0 0 253.047
> Info: PERIODIC CELL CENTER   0 0 0
> Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
> Info: LOAD BALANCER  Centralized
> Info: LOAD BALANCING STRATEGY  New Load Balancers -- DEFAULT
> Info: LDB PERIOD             2000 steps
> Info: FIRST LDB TIMESTEP     50
> Info: LAST LDB TIMESTEP     -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: PME BACKGROUND SCALING 1
> Info: MIN ATOMS PER PATCH    40
> Info: VELOCITY FILE          output.12.vel
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC             1
> Info: EXCLUDE                SCALED ONE-FOUR
> Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
> Info: DCD FILENAME           OUT_DCD
> Info: DCD FREQUENCY          2000
> Info: DCD FIRST STEP         2000
> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
> Info: XST FILENAME           OUTNAME.xst
> Info: XST FREQUENCY          2000
> Info: NO VELOCITY DCD OUTPUT
> Info: NO FORCE DCD OUTPUT
> Info: OUTPUT FILENAME        OUTNAME
> Info: BINARY OUTPUT FILES WILL BE USED
> Info: RESTART FILENAME       RESTART
> Info: RESTART FREQUENCY      2000
> Info: BINARY RESTART FILES WILL BE USED
> Info: CUTOFF                 10
> Info: PAIRLIST DISTANCE      30
> Info: PAIRLIST SHRINK RATE   0.01
> Info: PAIRLIST GROW RATE     0.01
> Info: PAIRLIST TRIGGER       0.3
> Info: PAIRLISTS PER CYCLE    2
> Info: PAIRLISTS ENABLED
> Info: MARGIN                 0.975
> Info: HYDROGEN GROUP CUTOFF  2.5
> Info: PATCH DIMENSION        33.475
> Info: ENERGY OUTPUT STEPS    2000
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: TIMING OUTPUT STEPS    20000
> Info: PRESSURE OUTPUT STEPS  2000
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE   300
> Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
> Info:        TARGET PRESSURE IS 1.01325 BAR
> Info:     OSCILLATION PERIOD IS 100 FS
> Info:             DECAY TIME IS 50 FS
> Info:     PISTON TEMPERATURE IS 300 K
> Info:       PRESSURE CONTROL IS GROUP-BASED
> Info:    INITIAL STRAIN RATE IS -2.58418e-05 -2.58418e-05 -2.58418e-05
> Info:       CELL FLUCTUATION IS ISOTROPIC
> Info: PARTICLE MESH EWALD (PME) ACTIVE
> Info: PME TOLERANCE               1e-06
> Info: PME EWALD COEFFICIENT       0.312341
> Info: PME INTERPOLATION ORDER     4
> Info: PME GRID DIMENSIONS         240 256 256
> Info: PME MAXIMUM GRID SPACING    1
> Info: Attempting to read FFTW data from
> FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
> Info: Optimizing 6 FFT steps.  1... 2... 3... 4... 5... 6...   Done.
> Info: Writing FFTW data to FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY      2
> Info: USING VERLET I (r-RESPA) MTS SCHEME.
> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
> Info: RANDOM NUMBER SEED     1301507526
> Info: USE HYDROGEN BONDS?    NO
> Info: Using AMBER format force field!
> Info: AMBER PARM FILE        F11_vacnew.prmtop
> Info: COORDINATE PDB         F11_vacnew_amber.pdb
> Info: Exclusions in PARM file will be ignored!
> Info: SCNB (VDW SCALING)     2
> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
> Info: BINARY COORDINATES     output.12.coor
> Reading parm file (F11_vacnew.prmtop) ...
> PARM file in AMBER 7 format
> Info: SUMMARY OF PARAMETERS:
> Info: 10 BONDS
> Info: 21 ANGLES
> Info: 14 DIHEDRAL
> Info: 0 IMPROPER
> Info: 0 CROSSTERM
> Info: 0 VDW
> Info: 28 VDW_PAIRS
> Info: TIME FOR READING PDB FILE: 0.552101
> Info:
> Info: Reading from binary file output.12.coor
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 309171 ATOMS
> Info: 323502 BONDS
> Info: 601959 ANGLES
> Info: 966408 DIHEDRALS
> Info: 0 IMPROPERS
> Info: 0 CROSSTERMS
> Info: 0 EXCLUSIONS
> Info: 927513 DEGREES OF FREEDOM
> Info: 147417 HYDROGEN GROUPS
> Info: 3 ATOMS IN LARGEST HYDROGEN GROUP
> Info: 147417 MIGRATION GROUPS
> Info: 3 ATOMS IN LARGEST MIGRATION GROUP
> Info: TOTAL MASS = 2.06854e+06 amu
> Info: TOTAL CHARGE = 0.000403497 e
> Info: MASS DENSITY = 0.230915 g/cm^3
> Info: ATOM DENSITY = 0.020784 atoms/A^3
> Info: *****************************
> Info:
> Info: Entering startup at 8.71072 s, 103.392 MB of memory in use
> Info: Startup phase 0 took 2.90871e-05 s, 103.392 MB of memory in use
> Info: Startup phase 1 took 0.622876 s, 195.255 MB of memory in use
> Info: Startup phase 2 took 0.00120592 s, 197.618 MB of memory in use
> Info: Startup phase 3 took 1.90735e-05 s, 197.617 MB of memory in use
> Info: PATCH GRID IS 6 (PERIODIC) BY 7 (PERIODIC) BY 7 (PERIODIC)
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: Reading from binary file output.12.vel
> Info: REMOVING COM VELOCITY 0.015786 -0.022846 0.0100032
> Info: LARGEST PATCH (194) HAS 5836 ATOMS
> Info: Startup phase 4 took 0.0868609 s, 242.218 MB of memory in use
> Info: PME using 1 and 1 processors for FFT and reciprocal sum.
> Info: PME GRID LOCATIONS: 0
> Info: PME TRANS LOCATIONS: 0
> Info: Optimizing 4 FFT steps.  1... 2... 3... 4...   Done.
> Info: Startup phase 5 took 0.0263231 s, 302.758 MB of memory in use
> Info: Startup phase 6 took 6.69956e-05 s, 302.758 MB of memory in use
> LDB: Central LB being created...
> Info: Startup phase 7 took 0.000169039 s, 302.76 MB of memory in use
> Info: CREATING 5888 COMPUTE OBJECTS
> Info: useSync: 1 useProxySync: 0
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 705 POINTS
> Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
> Info: Updated CUDA LJ table with 7 x 7 elements.
> Info: Updated CUDA force table with 4096 elements.
> Info: Found 100192 unique exclusion lists needing -237752280 bytes
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
> [0] Stack Traceback:
>  [0:0] CmiAbort+0x7b  [0xadcae5]
>  [0:1] CmiOutOfMemory+0x66  [0xa0c1d8]
>  [0:2] malloc+0x35  [0xa0c55f]
>  [0:3] _Znwm+0x1d  [0x322b8bd17d]
>  [0:4] _Znam+0x9  [0x322b8bd299]
>  [0:5] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee  [0x6cc5a2]
>  [0:6] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75  [0x6cacd7]
>  [0:7] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6  [0x6cc902]
>  [0:8] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119  [0x5be009]
>  [0:9] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c  [0x5c4fac]
>  [0:10] _ZN4Node7startupEv+0x2d0  [0x8ef29e]
>  [0:11] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12  [0x8eefca]
>  [0:12] CkDeliverMessageFree+0x21  [0xa1a353]
>  [0:13] _Z15_processHandlerPvP11CkCoreState+0x70b  [0xa191fb]
>  [0:14] CsdScheduleForever+0xa5  [0xae36ff]
>  [0:15] CsdScheduler+0x1c  [0xae3300]
>  [0:16] _ZN7BackEnd7suspendEv+0xb  [0x53eb65]
>  [0:17] _ZN9ScriptTcl9initcheckEv+0x80  [0x969bac]
>  [0:18] _ZN9ScriptTcl3runEv+0x66  [0x965734]
>  [0:19] _Z18after_backend_initiPPc+0x4fd  [0x53a515]
>  [0:20] main+0x3a  [0x539fe2]
>  [0:21] __libc_start_main+0xf4  [0x3219a1d994]
>  [0:22] _ZNSt8ios_base4InitD1Ev+0x6a  [0x53599a]
> Charm++ fatal error:
> Could not malloc()--are we out of memory? (used: 326.885MB)
> [0] Stack Traceback:
>  [0:0] /opt/NAMD/NAMD_2.8b1_Source/NAMD_GPU/namd2 [0xadd6f5]
>  [0:1] CmiAbort+0xb9  [0xadcb23]
>  [0:2] CmiOutOfMemory+0x66  [0xa0c1d8]
>  [0:3] malloc+0x35  [0xa0c55f]
>  [0:4] _Znwm+0x1d  [0x322b8bd17d]
>  [0:5] _Znam+0x9  [0x322b8bd299]
>  [0:6] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee  [0x6cc5a2]
>  [0:7] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75  [0x6cacd7]
>  [0:8] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6  [0x6cc902]
>  [0:9] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119  [0x5be009]
>  [0:10] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c  [0x5c4fac]
>  [0:11] _ZN4Node7startupEv+0x2d0  [0x8ef29e]
>  [0:12] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12  [0x8eefca]
>  [0:13] CkDeliverMessageFree+0x21  [0xa1a353]
>  [0:14] _Z15_processHandlerPvP11CkCoreState+0x70b  [0xa191fb]
>  [0:15] CsdScheduleForever+0xa5  [0xae36ff]
>  [0:16] CsdScheduler+0x1c  [0xae3300]
>  [0:17] _ZN7BackEnd7suspendEv+0xb  [0x53eb65]
>  [0:18] _ZN9ScriptTcl9initcheckEv+0x80  [0x969bac]
>  [0:19] _ZN9ScriptTcl3runEv+0x66  [0x965734]
>  [0:20] _Z18after_backend_initiPPc+0x4fd  [0x53a515]
>  [0:21] main+0x3a  [0x539fe2]
>  [0:22] __libc_start_main+0xf4  [0x3219a1d994]
>  [0:23] _ZNSt8ios_base4InitD1Ev+0x6a  [0x53599a]
> Aborted
> [maly_at_pcm5227 big]$
>
>
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:03 CST