Re: The biggest system simulated on one TESLA C2050 ?

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Wed Apr 13 2011 - 09:37:46 CDT

You're overflowing an int that counts bits in the exclusion compression
code, which is why it's trying to allocate "-237752280 bytes". I've just
checked in a fix for this (should be in the April 14 nightly build), but
you may still have problems.

The fact that you have 100192 unique exclusion patterns suggests that you
have a huge number of cross-links in your system, and with only 7 VDW
types I'm guessing this isn't your average protein/lipid system.

-jim

On Wed, 30 Mar 2011, Marek Maly wrote:

> Dear all,
>
> I would like to know which is the biggest system (in terms of number of atoms
> in box including the water one) which was ever simulated with cuda
> implementation of NAMD
> on one single GPU card - if possible TESLA C2050 (let say with cutoff 10 A) ?
>
> I am asking because we failed to simulate on our GPU workstation equipped
> with Tesla C2050 the system with cca 300 000 atoms, obtaining
> this error.
>
> //////////////////////////////////////////////////
> Info: Updated CUDA LJ table with 7 x 7 elements.
> Info: Updated CUDA force table with 4096 elements.
> Info: Found 100192 unique exclusion lists needing -237752280 bytes
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
> ////////////////////////////////////////////////////
>
> The full record is below.
>
> I have obtained the same error with CUDA version compilled from the sources (
> ./config Linux-x86_64-g++ --with-cuda ) as
> well as with prebuilded version NAMD_2.8b1_Linux-x86_64-CUDA.tar.gz .
>
> I wanted to eliminate the problem by compilation with memory optimisation
> flag (./config Linux-x86_64-g++ --with-cuda --with-memopt ) but
> when I added memory optimisation flag, compilation finished with error:
>
> /////////////////////////////////////////////
> g++ -m64 -I.rootdir/charm-6.3.0/net-linux-x86_64/include -DCMK_OPTIMIZE=1
> -Isrc -Iinc -Iplugins/include -DSTATIC_PLUGIN -I.rootdir/tcl/include
> -DNAMD_TCL -I.rootdir/fftw/include -DNAMD_FFTW -DNAMD_CUDA -I.
> -I/usr/local/cuda/include -DMEM_OPT_VERSION -DNAMD_VERSION=\"2.8b1\"
> -DNAMD_PLATFORM=\"Linux-x86_64-CUDA-memopt\"
> -DREMOVE_PROXYRESULTMSG_EXTRACOPY -O3 -fexpensive-optimizations -ffast-math
> -o obj/ComputeNonbondedCUDA.o -c src/ComputeNonbondedCUDA.C
> In file included from src/ComputeNonbondedCUDA.C:13:
> src/ComputeNonbondedCUDAKernel.h:7:17: warning: extra tokens at end of #undef
> directive
> src/ComputeNonbondedCUDA.C: In member function void
> ComputeNonbondedCUDA::build_exclusions():
> src/ComputeNonbondedCUDA.C:425: error: class Molecule has no member named
> get_full_exclusions_for_atom
> src/ComputeNonbondedCUDA.C: In member function virtual void
> ComputeNonbondedCUDA::doWork():
> src/ComputeNonbondedCUDA.C:764: warning: converting to int from double
> src/ComputeNonbondedCUDA.C:879: warning: converting to int from double
> src/ComputeNonbondedCUDA.C: In member function int
> ComputeNonbondedCUDA::finishWork():
> src/ComputeNonbondedCUDA.C:1303: error: class Molecule has no member named
> get_full_exclusions_for_atom
> make: *** [obj/ComputeNonbondedCUDA.o] Error 1
> /////////////////////////////////////////////
>
> although without the memory optimisation flag compilation finished without
> any problems.
> Just for the completeness, version of our g++ is "gcc version 4.1.2 20080704"
> and we
> are using CentOS.
>
> Thanks in advance for any relevant comments, suggestions !
>
> Best wishes,
>
> Marek
>
>
>
>
>
> THE FULL OUTPUT FROM OUR ATTEMPT TO SIMULATE CCA 300k atoms
>
>
> [maly_at_pcm5227 big]$ $NAMD_GPU/namd2 +idlepoll +devices 1 INPUT
> Charm++: standalone mode (not using charmrun)
> Warning> Randomization of stack pointer is turned on in kernel, thread
> migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as
> root to disable it, or try run with '+isomalloc_sync'.
> Charm++> scheduler running in netpoll mode.
> Charm++> Running on 1 unique compute nodes (16-way SMP).
> Charm++> cpu topology info is gathered in 5.004 seconds.
> Info: NAMD 2.8b1 for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Mar 26 11:10:31 CDT 2011 by jim on larissa.ks.uiuc.edu
> Info: 1 NAMD 2.8b1 Linux-x86_64-CUDA 1 pcm5227 maly
> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 5.00608 s
> Pe 0 physical rank 0 binding to CUDA device 1 on pcm5227: 'Tesla C2050' Mem:
> 3071MB Rev: 2.0
> Info: 1.49953 MB of memory in use based on CmiMemoryUsage
> Info: Configuration file is INPUT
> Info: Working in the current directory /home/maly/forMAREK/big
> TCL: Suspending until startup complete.
> Warning: The following variables were set in the
> Warning: configuration file but will be ignored:
> Warning: switchdist (switching)
> Info: EXTENDED SYSTEM FILE output.12.xsc
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP 0.75
> Info: NUMBER OF STEPS 20000
> Info: STEPS PER CYCLE 10
> Info: PERIODIC CELL BASIS 1 231.096 0 0
> Info: PERIODIC CELL BASIS 2 0 254.376 0
> Info: PERIODIC CELL BASIS 3 0 0 253.047
> Info: PERIODIC CELL CENTER 0 0 0
> Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
> Info: LOAD BALANCER Centralized
> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
> Info: LDB PERIOD 2000 steps
> Info: FIRST LDB TIMESTEP 50
> Info: LAST LDB TIMESTEP -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: PME BACKGROUND SCALING 1
> Info: MIN ATOMS PER PATCH 40
> Info: VELOCITY FILE output.12.vel
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC 1
> Info: EXCLUDE SCALED ONE-FOUR
> Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
> Info: DCD FILENAME OUT_DCD
> Info: DCD FREQUENCY 2000
> Info: DCD FIRST STEP 2000
> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
> Info: XST FILENAME OUTNAME.xst
> Info: XST FREQUENCY 2000
> Info: NO VELOCITY DCD OUTPUT
> Info: NO FORCE DCD OUTPUT
> Info: OUTPUT FILENAME OUTNAME
> Info: BINARY OUTPUT FILES WILL BE USED
> Info: RESTART FILENAME RESTART
> Info: RESTART FREQUENCY 2000
> Info: BINARY RESTART FILES WILL BE USED
> Info: CUTOFF 10
> Info: PAIRLIST DISTANCE 30
> Info: PAIRLIST SHRINK RATE 0.01
> Info: PAIRLIST GROW RATE 0.01
> Info: PAIRLIST TRIGGER 0.3
> Info: PAIRLISTS PER CYCLE 2
> Info: PAIRLISTS ENABLED
> Info: MARGIN 0.975
> Info: HYDROGEN GROUP CUTOFF 2.5
> Info: PATCH DIMENSION 33.475
> Info: ENERGY OUTPUT STEPS 2000
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: TIMING OUTPUT STEPS 20000
> Info: PRESSURE OUTPUT STEPS 2000
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE 300
> Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
> Info: TARGET PRESSURE IS 1.01325 BAR
> Info: OSCILLATION PERIOD IS 100 FS
> Info: DECAY TIME IS 50 FS
> Info: PISTON TEMPERATURE IS 300 K
> Info: PRESSURE CONTROL IS GROUP-BASED
> Info: INITIAL STRAIN RATE IS -2.58418e-05 -2.58418e-05 -2.58418e-05
> Info: CELL FLUCTUATION IS ISOTROPIC
> Info: PARTICLE MESH EWALD (PME) ACTIVE
> Info: PME TOLERANCE 1e-06
> Info: PME EWALD COEFFICIENT 0.312341
> Info: PME INTERPOLATION ORDER 4
> Info: PME GRID DIMENSIONS 240 256 256
> Info: PME MAXIMUM GRID SPACING 1
> Info: Attempting to read FFTW data from FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
> Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
> Info: Writing FFTW data to FFTW_NAMD_2.8b1_Linux-x86_64-CUDA.txt
> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
> Info: USING VERLET I (r-RESPA) MTS SCHEME.
> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
> Info: RANDOM NUMBER SEED 1301507526
> Info: USE HYDROGEN BONDS? NO
> Info: Using AMBER format force field!
> Info: AMBER PARM FILE F11_vacnew.prmtop
> Info: COORDINATE PDB F11_vacnew_amber.pdb
> Info: Exclusions in PARM file will be ignored!
> Info: SCNB (VDW SCALING) 2
> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
> Info: BINARY COORDINATES output.12.coor
> Reading parm file (F11_vacnew.prmtop) ...
> PARM file in AMBER 7 format
> Info: SUMMARY OF PARAMETERS:
> Info: 10 BONDS
> Info: 21 ANGLES
> Info: 14 DIHEDRAL
> Info: 0 IMPROPER
> Info: 0 CROSSTERM
> Info: 0 VDW
> Info: 28 VDW_PAIRS
> Info: TIME FOR READING PDB FILE: 0.552101
> Info:
> Info: Reading from binary file output.12.coor
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 309171 ATOMS
> Info: 323502 BONDS
> Info: 601959 ANGLES
> Info: 966408 DIHEDRALS
> Info: 0 IMPROPERS
> Info: 0 CROSSTERMS
> Info: 0 EXCLUSIONS
> Info: 927513 DEGREES OF FREEDOM
> Info: 147417 HYDROGEN GROUPS
> Info: 3 ATOMS IN LARGEST HYDROGEN GROUP
> Info: 147417 MIGRATION GROUPS
> Info: 3 ATOMS IN LARGEST MIGRATION GROUP
> Info: TOTAL MASS = 2.06854e+06 amu
> Info: TOTAL CHARGE = 0.000403497 e
> Info: MASS DENSITY = 0.230915 g/cm^3
> Info: ATOM DENSITY = 0.020784 atoms/A^3
> Info: *****************************
> Info:
> Info: Entering startup at 8.71072 s, 103.392 MB of memory in use
> Info: Startup phase 0 took 2.90871e-05 s, 103.392 MB of memory in use
> Info: Startup phase 1 took 0.622876 s, 195.255 MB of memory in use
> Info: Startup phase 2 took 0.00120592 s, 197.618 MB of memory in use
> Info: Startup phase 3 took 1.90735e-05 s, 197.617 MB of memory in use
> Info: PATCH GRID IS 6 (PERIODIC) BY 7 (PERIODIC) BY 7 (PERIODIC)
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: Reading from binary file output.12.vel
> Info: REMOVING COM VELOCITY 0.015786 -0.022846 0.0100032
> Info: LARGEST PATCH (194) HAS 5836 ATOMS
> Info: Startup phase 4 took 0.0868609 s, 242.218 MB of memory in use
> Info: PME using 1 and 1 processors for FFT and reciprocal sum.
> Info: PME GRID LOCATIONS: 0
> Info: PME TRANS LOCATIONS: 0
> Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
> Info: Startup phase 5 took 0.0263231 s, 302.758 MB of memory in use
> Info: Startup phase 6 took 6.69956e-05 s, 302.758 MB of memory in use
> LDB: Central LB being created...
> Info: Startup phase 7 took 0.000169039 s, 302.76 MB of memory in use
> Info: CREATING 5888 COMPUTE OBJECTS
> Info: useSync: 1 useProxySync: 0
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 705 POINTS
> Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
> Info: Updated CUDA LJ table with 7 x 7 elements.
> Info: Updated CUDA force table with 4096 elements.
> Info: Found 100192 unique exclusion lists needing -237752280 bytes
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: Could not malloc()--are we out of memory? (used: 326.885MB)
> [0] Stack Traceback:
> [0:0] CmiAbort+0x7b [0xadcae5]
> [0:1] CmiOutOfMemory+0x66 [0xa0c1d8]
> [0:2] malloc+0x35 [0xa0c55f]
> [0:3] _Znwm+0x1d [0x322b8bd17d]
> [0:4] _Znam+0x9 [0x322b8bd299]
> [0:5] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee [0x6cc5a2]
> [0:6] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75 [0x6cacd7]
> [0:7] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6 [0x6cc902]
> [0:8] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119 [0x5be009]
> [0:9] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c [0x5c4fac]
> [0:10] _ZN4Node7startupEv+0x2d0 [0x8ef29e]
> [0:11] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x8eefca]
> [0:12] CkDeliverMessageFree+0x21 [0xa1a353]
> [0:13] _Z15_processHandlerPvP11CkCoreState+0x70b [0xa191fb]
> [0:14] CsdScheduleForever+0xa5 [0xae36ff]
> [0:15] CsdScheduler+0x1c [0xae3300]
> [0:16] _ZN7BackEnd7suspendEv+0xb [0x53eb65]
> [0:17] _ZN9ScriptTcl9initcheckEv+0x80 [0x969bac]
> [0:18] _ZN9ScriptTcl3runEv+0x66 [0x965734]
> [0:19] _Z18after_backend_initiPPc+0x4fd [0x53a515]
> [0:20] main+0x3a [0x539fe2]
> [0:21] __libc_start_main+0xf4 [0x3219a1d994]
> [0:22] _ZNSt8ios_base4InitD1Ev+0x6a [0x53599a]
> Charm++ fatal error:
> Could not malloc()--are we out of memory? (used: 326.885MB)
> [0] Stack Traceback:
> [0:0] /opt/NAMD/NAMD_2.8b1_Source/NAMD_GPU/namd2 [0xadd6f5]
> [0:1] CmiAbort+0xb9 [0xadcb23]
> [0:2] CmiOutOfMemory+0x66 [0xa0c1d8]
> [0:3] malloc+0x35 [0xa0c55f]
> [0:4] _Znwm+0x1d [0x322b8bd17d]
> [0:5] _Znam+0x9 [0x322b8bd299]
> [0:6] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x15ee [0x6cc5a2]
> [0:7] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa75 [0x6cacd7]
> [0:8] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6 [0x6cc902]
> [0:9] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2119 [0x5be009]
> [0:10] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x45c [0x5c4fac]
> [0:11] _ZN4Node7startupEv+0x2d0 [0x8ef29e]
> [0:12] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x8eefca]
> [0:13] CkDeliverMessageFree+0x21 [0xa1a353]
> [0:14] _Z15_processHandlerPvP11CkCoreState+0x70b [0xa191fb]
> [0:15] CsdScheduleForever+0xa5 [0xae36ff]
> [0:16] CsdScheduler+0x1c [0xae3300]
> [0:17] _ZN7BackEnd7suspendEv+0xb [0x53eb65]
> [0:18] _ZN9ScriptTcl9initcheckEv+0x80 [0x969bac]
> [0:19] _ZN9ScriptTcl3runEv+0x66 [0x965734]
> [0:20] _Z18after_backend_initiPPc+0x4fd [0x53a515]
> [0:21] main+0x3a [0x539fe2]
> [0:22] __libc_start_main+0xf4 [0x3219a1d994]
> [0:23] _ZNSt8ios_base4InitD1Ev+0x6a [0x53599a]
> Aborted
> [maly_at_pcm5227 big]$
>
>
>
>
> --
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> http://www.opera.com/mail/

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:58 CST