From: Pietro Amodeo (pamodeo_at_icmib.na.cnr.it)
Date: Fri Nov 26 2010 - 18:07:46 CST
Hello,
I'm trying to run a simulationwith NAMD 2.7 CUDA x86_64 on a system
including 120978 atoms (a large membrane protein complex, with lipids,
water and ions), on a 24-core workstation with two CUDA devices (1 Tesla
C2050 Mem: 2687MB, 1 GeForce GTX 480 Mem: 1535MB).
Independently upon the number of cores (from 1 to 4 for 1GPU from 2 to 4
for 2GPUs) and/or GPUs used, as well as upon the setting for many
energy-related parameters (cutoffs, exclusions,...) the calculation aborts
with the following error message:
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: 10304 bytes of CUDA constant memory needed for
exclusions,
but only 8192 bytes available. Increase MAX_EXCLUSIONS.
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error memcpy to exclusions: invalid argument
Fatal error on PE 1> FATAL ERROR: CUDA error memcpy to exclusions: invalid
argument
While, obviously, the number of Processors varies according to the number
of used cores, the number of required and available bytes (this latter, I
guess, hardwired in the code) are invariant under any CPU/GPU/input
parameter setting I tried.
The same system runs flawlessly on the same system up to 24 cores with the
CPU-only version of NAMD 2.7.
At the end of the message I copied a representative output, obtained with
2CPU-2GPU.
Obviously, I can provide any other information or execute tests that can
be useful for the resolution of the problem.
Thank you in advance for any help or suggestion.
Regards,
Pietro Amodeo
Dr. Pietro Amodeo
Istituto di Chimica Biomolecolare (ICB) del CNR
Comprensorio "A. Olivetti", Edificio 70
Via Campi Flegrei 34
I-80078 Pozzuoli (Napoli) - Italy
Email pamodeo_at_icmib.na.cnr.it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Charm++: scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (24-way SMP).
Charm++> Cpu topology info:
PE to node map: 0 0
Node to PE map:
Chip #0: 0 1
Charm++> cpu topology info is gathered in 0.007 seconds.
Info: NAMD 2.7 for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60202 for net-linux-x86_64-iccstatic
Info: Built Wed Oct 13 11:39:40 CDT 2010 by jim on belfast.ks.uiuc.edu
Info: 1 NAMD 2.7 Linux-x86_64-CUDA 2 ulisse.icmib.na.cnr.it piero
Info: Running on 2 processors.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0095129 s
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 binding to CUDA device 0 on ulisse.icmib.na.cnr.it:
'GeForc
e GTX 480' Mem: 1535MB Rev: 2.0
Pe 1 physical rank 1 binding to CUDA device 1 on ulisse.icmib.na.cnr.it:
'Tesla
C2050' Mem: 2687MB Rev: 2.0
Info: 1.632 MB of memory in use based on CmiMemoryUsage
Info: Changed directory to /home/Mol/System_COMPLEX/testCUDA
Info: Configuration file is
System_COMPLEX_POPC_membr_nresPOPC_md5ps_npgt_unr.na
md
TCL: Suspending until startup complete.
Warning: The following variables were set in the
Warning: configuration file but were not needed
Warning: fixedAtomsForces
Warning: fixedAtomsFile
Warning: fixedAtomsCol
Warning: consref
Warning: conskfile
Warning: conskcol
Warning: constraintScaling
Warning: selectConstraints
Warning: selectConstrX
Warning: selectConstrY
Warning: selectConstrZ
Info: EXTENDED SYSTEM FILE
/home/Mol/System_COMPLEX/System_COMPLEX_POPC_membr_
nresPOPC_md100ps_npt_unr.xsc
Warning: ALWAYS USE NON-ZERO MARGIN WITH CONSTANT PRESSURE!
Warning: CHANGING MARGIN FROM 0 to 0.81
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 20
Info: PERIODIC CELL BASIS 1 88.4383 0 0
Info: PERIODIC CELL BASIS 2 0 102.279 0
Info: PERIODIC CELL BASIS 3 0 0 130.301
Info: PERIODIC CELL CENTER -2.268 -0.160501 7.716
Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
Info: WRAPPING TO IMAGE NEAREST TO PERIODIC CELL CENTER.
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- ASB
Info: LDB PERIOD 4000 steps
Info: FIRST LDB TIMESTEP 100
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MAX SELF PARTITIONS 1
Info: MAX PAIR PARTITIONS 1
Info: SELF PARTITION ATOMS 154
Info: SELF2 PARTITION ATOMS 154
Info: PAIR PARTITION ATOMS 318
Info: PAIR2 PARTITION ATOMS 637
Info: MIN ATOMS PER PATCH 100
Info: VELOCITY FILE
/home/Mol/System_COMPLEX/System_COMPLEX_POPC_membr_
nresPOPC_md100ps_npt_unr.vel
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME
/home/Mol/System_COMPLEX/testCUDA/System_COMPLEX_PO
PC_membr_nresPOPC_md5ns_npgt_unr.dcd
Info: DCD FREQUENCY 1000
Info: DCD FIRST STEP 1000
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: XST FILENAME
/home/Mol/System_COMPLEX/testCUDA/System_COMPLEX_PO
PC_membr_nresPOPC_md5ns_npgt_unr.xst
Info: XST FREQUENCY 1000
Info: NO VELOCITY DCD OUTPUT
Info: OUTPUT FILENAME
System_COMPLEX_POPC_membr_nresPOPC_md5ns_npgt_unr
Info: RESTART FILENAME
System_COMPLEX_POPC_membr_nresPOPC_md5ns_npgt_unr.r
estart
Info: RESTART FREQUENCY 1000
Info: SWITCHING ACTIVE
Info: SWITCHING ON 8
Info: SWITCHING OFF 9
Info: PAIRLIST DISTANCE 11
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0.81
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 14.31
Info: ENERGY OUTPUT STEPS 1000
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 1000
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 310
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
Info: TARGET PRESSURE IS 1.01325 BAR
Info: OSCILLATION PERIOD IS 1000 FS
Info: DECAY TIME IS 500 FS
Info: PISTON TEMPERATURE IS 310 K
Info: PRESSURE CONTROL IS GROUP-BASED
Info: INITIAL STRAIN RATE IS -1.11083e-06 -2.85184e-06 -3.99729e-06
Info: CELL FLUCTUATION IS ANISOTROPIC
Info: SURFACE TENSION CONTROL ACTIVE
Info: TARGET SURFACE TENSION IS 10 DYN/CM
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.348832
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 96 108 135
Info: PME MAXIMUM GRID SPACING 1.5
Info: Attempting to read FFTW data from FFTW_NAMD_2.7_Linux-x86_64-CUDA.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to FFTW_NAMD_2.7_Linux-x86_64-CUDA.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 4
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: NONBONDED FORCES EVALUATED EVERY 2 STEPS
Info: RANDOM NUMBER SEED 12345
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB
/home/Mol/System_COMPLEX/System_COMPLEX_POPC_membr_
nresPOPC_md100ps_npt_unr.coor
Info: STRUCTURE FILE
/home/Mol/System_COMPLEX/System_COMPLEX_POPC_membr_
nresPOPC.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS
/home/Mol/System_COMPLEX/par_all27_prot_lipid_na.in
p
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Warning: DUPLICATE ANGLE ENTRY FOR CPH1-NR1-CPH2
PREVIOUS VALUES k=130 theta0=107.5 k_ub=0 r_ub=0
USING VALUES k=130 theta0=107 k_ub=0 r_ub=0
Info: SUMMARY OF PARAMETERS:
Info: 299 BONDS
Info: 729 ANGLES
Info: 1145 DIHEDRAL
Info: 84 IMPROPER
Info: 0 CROSSTERM
Info: 161 VDW
Info: 0 VDW_PAIRS
Info: TIME FOR READING PSF FILE: 1.99579
Info: TIME FOR READING PDB FILE: 0.193374
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 120977 ATOMS
Info: 94395 BONDS
Info: 104455 ANGLES
Info: 110798 DIHEDRALS
Info: 2962 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 102969 RIGID BONDS
Info: 259962 DEGREES OF FREEDOM
Info: 44449 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 44449 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 739544 amu
Info: TOTAL CHARGE = 3.46228e-05 e
Info: MASS DENSITY = 1.04195 g/cm^3
Info: ATOM DENSITY = 0.102643 atoms/A^3
Info: *****************************
Info:
Info: Entering startup at 27.2138 s, 35.1788 MB of memory in use
Info: Startup phase 0 took 8.29697e-05 s, 35.1795 MB of memory in use
Info: Startup phase 1 took 0.558542 s, 60.6464 MB of memory in use
Info: Startup phase 2 took 0.000669003 s, 61.5756 MB of memory in use
Info: PATCH GRID IS 6 (PERIODIC) BY 7 (PERIODIC) BY 9 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY -0.025689 -0.0247133 0.000427051
Info: LARGEST PATCH (214) HAS 367 ATOMS
Info: Startup phase 3 took 0.223135 s, 80.8164 MB of memory in use
Info: PME using 2 and 2 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 0 1
Info: PME TRANS LOCATIONS: 0 1
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Startup phase 4 took 0.00213313 s, 86.2107 MB of memory in use
Info: Startup phase 5 took 0.00973487 s, 76.9986 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 6 took 1.20397 s, 77.0774 MB of memory in use
Info: CREATING 7576 COMPUTE OBJECTS
Info: useSync: 1 useProxySync: 0
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 705 POINTS
Info: Updated CUDA force table with 4096 elements.
Info: Found 449 unique exclusion lists needing 10304 bytes
FATAL ERROR: 10304 bytes of CUDA constant memory needed for exclusions,
but only
8192 bytes available. Increase MAX_EXCLUSIONS.
FATAL ERROR: CUDA error memcpy to exclusions: invalid argument
[1] Stack Traceback:
[1:0] CmiAbort+0x7b [0xa76075]
[1:1] _Z8NAMD_diePKc+0x62 [0x5228a2]
[1:2] _Z13cuda_errcheckPKc+0x45 [0x6bc7cf]
[1:3] _ZN20ComputeNonbondedCUDA16build_exclusionsEv+0x14aa [0x6b85e2]
[1:4] _ZN20ComputeNonbondedCUDAC9EiP10ComputeMgr+0xa57 [0x6b6e53]
[1:5] _ZN20ComputeNonbondedCUDAC1EiP10ComputeMgr+0x6 [0x6b8810]
[1:6] _ZN10ComputeMgr13createComputeEiP10ComputeMap+0x2221 [0x5ab7ad]
[1:7] _ZN10ComputeMgr14createComputesEP10ComputeMap+0x3fb [0x5b1fa7]
[1:8] _ZN4Node7startupEv+0x2d1 [0x8b31f9]
[1:9] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x8b2f24]
[1:10] CkDeliverMessageFree+0x21 [0x9bdbab]
[1:11] _Z15_processHandlerPvP11CkCoreState+0x711 [0x9bca53]
[1:12] CsdScheduleForever+0xa5 [0xa7cc13]
[1:13] CsdScheduler+0x1c [0xa7c814]
[1:14] _Z11master_initiPPc+0x2c1 [0x52ba59]
[1:15] _ZN7BackEnd4initEiPPc+0x8f [0x52b78b]
[1:16] main+0x2f [0x526e87]
[1:17] __libc_start_main+0xfd [0x3361e1ec5d]
[1:18] _ZNSt8ios_base4InitD1Ev+0x72 [0x52219a]
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:48 CST