Re: FATAL ERROR: CudaTileListKernel::buildTileLists,

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Tue Mar 01 2022 - 08:45:03 CST

I think that the key here is not the total number of atoms, but this
message:
*Too many atoms in a patch*
i.e. the local density seems to be a problem.

Without knowing how the system was set up, the best suggestion would be to
minimize it without the GPU, i.e. with NAMD 2 or the CUDASOA flag off.
Unlike MD, which can be run in mixed precision under certain conditions,
minimization is much more reliable if run in double precision. I am not
sure if the current GPU code allows that.

Giacomo

On Tue, Mar 1, 2022 at 2:16 AM Tue Boesen <alyflex_at_gmail.com> wrote:

> I'm trying to run energy minimization using NAMD 3.0alpha9 for
> Linux-x86_64-multicore-CUDA, it works well for smaller systems, but I find
> that for large systems I consistently get this error:
>
> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
> allocation exceeded. Too many atoms in a patch
>
> I'm running the minimization on a Geforce RTX 3090 with 24GB memory, so I
> believe I should have enough memory though it doesn't tell me exactly how
> much it is using.
>
> The system I'm minimizing has about 1.5M atoms, and consists of a protein
> in a box of water with a few Na+ Cl- ions.
>
> I have attached the logfile of the error below.
>
> Does anyone have any good suggestions for how to run this minimization?
>
> Cheers
> Tue
>
>
>
> Charm++> No provisioning arguments specified. Running with a single PE.
> Use +auto-provision to fully subscribe resources or +p1 to
> silence this message.
> Charm++: standalone mode (not using charmrun)
> Charm++> Running in Multicore mode: 1 threads (PEs)
> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
> Converse/Charm++ Commit ID: v6.10.2-0-g7bf00fa
> Warning> Randomization of virtual memory (ASLR) is turned on in the
> kernel, thread migration may not work! Run 'echo 0 >
> /proc/sys/kernel/randomize_va_space' as root to disable it, or try running
> with '+isomalloc_sync'.
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 hosts (1 sockets x 8 cores x 2 PUs = 16-way SMP)
> Charm++> cpu topology info is gathered in 0.000 seconds.
> Info: Built with CUDA version 11000
> Did not find +devices i,j,k,... argument, using all
> Pe 0 physical rank 0 binding to CUDA device 0 on tue-ubuntu: 'NVIDIA
> GeForce RTX 3090' Mem: 24265MB Rev: 8.6 PCI: 0:9:0
> Info: NAMD 3.0alpha9 for Linux-x86_64-multicore-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Chem. Phys. 153:044130 (2020)
> doi:10.1063/5.0014475
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 61002 for multicore-linux-x86_64-iccstatic
> Info: Built Sun Feb 28 21:57:49 CST 2021 by jmaia on manila.ks.uiuc.edu
> Info: 1 NAMD 3.0alpha9 Linux-x86_64-multicore-CUDA 1 tue-ubuntu tue
> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.283695 s
> Info: 0 MB of memory in use based on /proc/self/stat
> Info: Using bitfields in atom data structures.
> Info: sizeof( CompAtom ) = 32
> Info: sizeof( CompAtomExt ) = 8
> CkLoopLib is used in SMP with simple dynamic scheduling (converse-level
> notification)
> Info: Configuration file is
> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd/AF_1W_1WU_1WUZ_1_A.conf
> Info: Changed directory to
> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/namd
> TCL: Suspending until startup complete.
> Warning: The following variables were set in the
> Warning: configuration file but will be ignored:
> Warning: paraTypeXplor (parameters)
> Warning: paraTypeCharmm (parameters)
> Info: Using TIP3P water model.
> Warning: The Langevin gamma parameters differ over the particles,
> Warning: requiring extra work per step to constrain rigid bonds.
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP 1
> Info: NUMBER OF STEPS 0
> Info: STEPS PER CYCLE 20
> Info: PERIODIC CELL BASIS 1 281.56 0 0
> Info: PERIODIC CELL BASIS 2 0 145.95 0
> Info: PERIODIC CELL BASIS 3 0 0 286.421
> Info: PERIODIC CELL CENTER -16.6987 1.34986 1.40648
> Info: WRAPPING WATERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
> Info: LOAD BALANCER Centralized
> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
> Info: LDB PERIOD 4000 steps
> Info: FIRST LDB TIMESTEP 100
> Info: LAST LDB TIMESTEP -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: PME BACKGROUND SCALING 1
> Info: MIN ATOMS PER PATCH 40
> Info: INITIAL TEMPERATURE 310
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC 1
> Info: EXCLUDE SCALED ONE-FOUR
> Info: 1-4 ELECTROSTATICS SCALED BY 0.833333
> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
> Info: DCD FILENAME min1.dcd
> Info: DCD FREQUENCY 200
> Info: DCD FIRST STEP 200
> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
> Info: NO VELOCITY DCD OUTPUT
> Info: NO FORCE DCD OUTPUT
> Info: OUTPUT FILENAME min1
> Info: RESTART FILENAME min1.restart
> Info: RESTART FREQUENCY 200
> Info: BINARY RESTART FILES WILL BE USED
> Info: CUTOFF 10
> Info: PAIRLIST DISTANCE 16
> Info: PAIRLIST SHRINK RATE 0.01
> Info: PAIRLIST GROW RATE 0.01
> Info: PAIRLIST TRIGGER 0.3
> Info: PAIRLISTS PER CYCLE 2
> Info: PAIRLIST OUTPUT STEPS 100
> Info: PAIRLISTS ENABLED
> Info: MARGIN 0.555
> Info: HYDROGEN GROUP CUTOFF 2.5
> Info: PATCH DIMENSION 19.055
> Info: ENERGY OUTPUT STEPS 200
> Info: ENERGY EVALUATION STEPS 200
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: MOMENTUM OUTPUT STEPS 200
> Info: TIMING OUTPUT STEPS 200
> Info: PRESSURE OUTPUT STEPS 200
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE 310
> Info: LANGEVIN USING BBK INTEGRATOR
> Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
> Info: TARGET PRESSURE IS 1.01325 BAR
> Info: OSCILLATION PERIOD IS 200 FS
> Info: DECAY TIME IS 100 FS
> Info: PISTON TEMPERATURE IS 310 K
> Info: PRESSURE CONTROL IS GROUP-BASED
> Info: INITIAL STRAIN RATE IS 0 0 0
> Info: CELL FLUCTUATION IS ISOTROPIC
> Info: PARTICLE MESH EWALD (PME) ACTIVE
> Info: PME TOLERANCE 1e-06
> Info: PME EWALD COEFFICIENT 0.312341
> Info: PME INTERPOLATION ORDER 4
> Info: PME GRID DIMENSIONS 288 150 288
> Info: PME MAXIMUM GRID SPACING 1
> Info: Attempting to read FFTW data from
> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
> Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
> Info: Writing FFTW data to
> FFTW_NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA.txt
> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
> Info: USING VERLET I (r-RESPA) MTS SCHEME.
> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
> Info: RIGID BONDS TO HYDROGEN : ALL
> Info: ERROR TOLERANCE : 1e-08
> Info: MAX ITERATIONS : 100
> Info: RIGID WATER USING SETTLE ALGORITHM
> Info: RANDOM NUMBER SEED 1646117723
> Info: USE HYDROGEN BONDS? NO
> Info: Using AMBER format force field!
> Info: AMBER PARM FILE
> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop
> Info: COORDINATE PDB
> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
> Info: Exclusions will be read from PARM file!
> Info: SCNB (VDW SCALING) 2
> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
> Reading parm file
> (/media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.prmtop)
> ...
> PARM file in AMBER 7 format
> Warning: Skipping ATOMIC_NUMBER in parm file while seeking MASS.
> Warning: Skipping SCEE_SCALE_FACTOR in parm file while seeking SOLTY.
> Warning: Skipping SCNB_SCALE_FACTOR in parm file while seeking SOLTY.
> Warning: Found 485687 H-H bonds.
> Info: SUMMARY OF PARAMETERS:
> Info: 67 BONDS
> Info: 153 ANGLES
> Info: 198 DIHEDRAL
> Info: 0 IMPROPER
> Info: 0 CROSSTERM
> Info: 0 VDW
> Info: 153 VDW_PAIRS
> Info: 0 NBTHOLE_PAIRS
> Info: Reading pdb file
> /media/tue/Data/Data/test_mini/RCSB/../relax_pdb//calc/AF_1W_1WU_1WUZ_1_A/leap/AF_1W_1WU_1WUZ_1_A_neutral.pdb
> Info: TIME FOR READING PDB FILE: 0.900383
> Info:
> Info: LONG-RANGE LJ: APPLYING ANALYTICAL CORRECTIONS TO ENERGY AND PRESSURE
> Info: LONG-RANGE LJ: AVERAGE A AND B COEFFICIENTS 574955 AND 581.291
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 1472897 ATOMS
> Info: 1471677 BONDS
> Info: 26517 ANGLES
> Info: 65693 DIHEDRALS
> Info: 0 IMPROPERS
> Info: 0 CROSSTERMS
> Info: 1536544 EXCLUSIONS
> Info: 1464268 RIGID BONDS
> Info: 2954423 DEGREES OF FREEDOM
> Info: 494316 HYDROGEN GROUPS
> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
> Info: 494316 MIGRATION GROUPS
> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
> Info: TOTAL MASS = 8.89287e+06 amu
> Info: TOTAL CHARGE = -3.33139e-05 e
> Info: MASS DENSITY = 1.25465 g/cm^3
> Info: ATOM DENSITY = 0.125139 atoms/A^3
> Info: *****************************
> Info:
> Info: Entering startup at 44.7864 s, 0 MB of memory in use
> Info: Startup phase 0 took 0.000248679 s, 0 MB of memory in use
> Info: ADDED 0 IMPLICIT EXCLUSIONS
> Info: Startup phase 1 took 0.201876 s, 0 MB of memory in use
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 705 POINTS
> Info: ABSOLUTE IMPRECISION IN FAST TABLE FORCE: 2.64698e-22 AT 9.94673
> Info: RELATIVE IMPRECISION IN FAST TABLE FORCE: 5.64247e-16 AT 9.94673
> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000290479 AT 0.251946
> Info: ABSOLUTE IMPRECISION IN SCOR TABLE FORCE: 2.11758e-22 AT 9.94673
> Info: RELATIVE IMPRECISION IN SCOR TABLE FORCE: 5.86184e-16 AT 9.94673
> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000178193 AT 9.97184
> Info: ABSOLUTE IMPRECISION IN VDWA TABLE FORCE: 1.00974e-28 AT 9.99687
> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
> Info: ABSOLUTE IMPRECISION IN VDWB TABLE FORCE: 6.2204e-22 AT 9.99687
> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
> Info: Startup phase 2 took 0.00026995 s, 0 MB of memory in use
> Info: Startup phase 3 took 1.1152e-05 s, 0 MB of memory in use
> Info: Startup phase 4 took 0.00199203 s, 0 MB of memory in use
> Info: Startup phase 5 took 1.6442e-05 s, 0 MB of memory in use
> Info: PATCH GRID IS 14 (PERIODIC) BY 7 (PERIODIC) BY 15 (PERIODIC)
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: REMOVING COM VELOCITY 0.000245969 -0.00383036 -0.00162446
> Info: LARGEST PATCH (736) HAS 112867 ATOMS
> Info: TORUS A SIZE 1 USING 0
> Info: TORUS B SIZE 1 USING 0
> Info: TORUS C SIZE 1 USING 0
> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
> Info: Placed 100% of base nodes on same physical node as patch
> Info: Startup phase 6 took 0.193281 s, 0 MB of memory in use
> Info: Use 3D box decompostion in PME FFT.
> Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
> Info: Startup phase 7 took 0.000113754 s, 0 MB of memory in use
> Info: Updated CUDA force table with 4096 elements.
> Info: Updated CUDA LJ table with 17 x 17 elements.
> Info: Startup phase 8 took 0.0210923 s, 0 MB of memory in use
> Info: Startup phase 9 took 3.051e-05 s, 0 MB of memory in use
> Info: Startup phase 10 took 1.0641e-05 s, 0 MB of memory in use
> Info: Startup phase 11 took 0.000820673 s, 0 MB of memory in use
> LDB: Central LB being created...
> Info: Startup phase 12 took 0.000622233 s, 0 MB of memory in use
> Info: CREATING 30878 COMPUTE OBJECTS
> Info: Found 348 unique exclusion lists needing 1216 bytes
> Info: Startup phase 13 took 0.320426 s, 0 MB of memory in use
> Info: Startup phase 14 took 4.9448e-05 s, 0 MB of memory in use
> Info: Startup phase 15 took 0.00141796 s, 0 MB of memory in use
> Info: Finished startup at 45.5287 s, 0 MB of memory in use
>
> TCL: Minimizing for 100 steps
> FATAL ERROR: CudaTileListKernel::buildTileLists, maximum shared memory
> allocation exceeded. Too many atoms in a patch
> [Partition 0][Node 0] End of program
>

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST