Re: NAMD3 failed to run longer than 4.2μs.

From: Rulong Ma (rulong.ma.jlu_at_gmail.com)
Date: Fri Nov 11 2022 - 14:58:51 CST

Hi Josh,

After resetting the first time step to 0, NAMD3 continues to run. You are
right. Thanks so much.

On Fri, Nov 11, 2022 at 2:00 PM Vermaas, Josh <vermaasj_at_msu.edu> wrote:

> Hi Rulong,
>
>
>
> 2100000000 steps is around INT_MAX, and that you are getting an overflow
> somewhere where NAMD3 assumes an integer. I’m betting that if you reset
> firsttimestep to 0, this simulation would run.
>
>
>
> -Josh
>
>
>
> *From: *<owner-namd-l_at_ks.uiuc.edu> on behalf of Rulong Ma <
> rulong.ma.jlu_at_gmail.com>
> *Reply-To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, Rulong Ma <
> rulong.ma.jlu_at_gmail.com>
> *Date: *Friday, November 11, 2022 at 1:17 PM
> *To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>
> *Subject: *namd-l: NAMD3 failed to run longer than 4.2μs.
>
>
>
> Hi NAMD team.
>
>
>
> I have used NAMD3 for more than one years. However, NAMD3 did not work
> when the simulation was longer than 4.2μs. There are no errors in the
> output file. This problem happened in more than four simulation systems.
> The following is the job script and output information. Could you help to
> solve this problem?
>
>
>
> ******* job script********************************************
>
> #!/bin/bash
> #SBATCH -J jobname # Name of the job
> #SBATCH -N 1 # Number of nodes
> #SBATCH -n 1 # number of processors(cores)
> #SBATCH -t 7-00:00:00 # Runtime in HH:MM:SS
> #SBATCH --mem=16GB # Memory requested in MB (see also
> --mem-per-cpu)
> #SBATCH -o comp_ca_%j.out # File to write STDOUT, %j=jobid
> #SBATCH -e comp_ca_%j.err # File to wrote STDERR, %j=jobid
> #SBATCH --mail-type=ALL # Send email when job starts, ends,
> fails, etc
> #SBATCH --mail-user=
>
> #SBATCH --gres=gpu:1
>
> #Load NAMD
> module purge
> module add NAMD/.3.0alpha-cuda
>
>
>
> namd3 +setcpuaffinity +CmiSpinOnIdle +idlepoll step7_29production.inp
> > step7_29production.out
>
>
>
> ***************************************************
>
>
>
> ********output file*******************************************
>
> Warning: DUPLICATE DIHEDRAL ENTRY FOR O-C-CTO1-HAL1
> PREVIOUS VALUES MULTIPLICITY: 1
> k=0 n=1 delta=0
> INCREASING MULTIPLICITY TO: 2
> k=0 n=2 delta=180
> Warning: SKIPPING PART OF PARAMETER FILE AFTER RETURN STATEMENT
> Info: SUMMARY OF PARAMETERS:
> Info: 1955 BONDS
> Info: 6609 ANGLES
> Info: 19569 DIHEDRAL
> Info: 425 IMPROPER
> Info: 28 CROSSTERM
> Info: 736 VDW
> Info: 120 VDW_PAIRS
> Info: 0 NBTHOLE_PAIRS
> Warning: Ignored 62273 bonds with zero force constants.
> Warning: Will get H-H distance in rigid H2O from H-O-H angle.
> Info: TIME FOR READING PSF FILE: 31.1866
> Info: Reading pdb file step5_input.pdb
> Info: TIME FOR READING PDB FILE: 0.321895
> Info:
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 253847 ATOMS
> Info: 190982 BONDS
> Info: 187813 ANGLES
> Info: 176049 DIHEDRALS
> Info: 3308 IMPROPERS
> Info: 992 CROSSTERMS
> Info: 0 EXCLUSIONS
> Info: 225753 RIGID BONDS
> Info: 535788 DEGREES OF FREEDOM
> Info: 90367 HYDROGEN GROUPS
> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
> Info: 90367 MIGRATION GROUPS
> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
> Info: TOTAL MASS = 1.537e+06 amu
> Info: TOTAL CHARGE = 3.14508e-05 e
> Info: MASS DENSITY = 1.02128 g/cm^3
> Info: ATOM DENSITY = 0.101575 atoms/A^3
>
> Info: *****************************
> Info: Reading from binary file step7_28production.restart.coor
> Info:
> Info: Entering startup at 62.5222 s, 0 MB of memory in use
> Info: Startup phase 0 took 0.000432299 s, 0 MB of memory in use
> Info: ADDED 554265 IMPLICIT EXCLUSIONS
> Info: Startup phase 1 took 0.114848 s, 0 MB of memory in use
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 769 POINTS
> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000324844 AT 11.9556
> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
> Info: Startup phase 2 took 0.0130796 s, 0 MB of memory in use
> Info: Startup phase 3 took 3.568e-05 s, 0 MB of memory in use
> Info: Startup phase 4 took 0.000184432 s, 0 MB of memory in use
> Info: Startup phase 5 took 2.4695e-05 s, 0 MB of memory in use
> Info: PATCH GRID IS 4 (PERIODIC) BY 4 (PERIODIC) BY 6 (PERIODIC)
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: Reading from binary file step7_28production.restart.vel
> Info: REMOVING COM VELOCITY 0.016666 -0.0256322 -0.0159315
> Info: LARGEST PATCH (13) HAS 2846 ATOMS
> Info: TORUS A SIZE 1 USING 0
> Info: TORUS B SIZE 1 USING 0
> Info: TORUS C SIZE 1 USING 0
> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
> Info: Placed 100% of base nodes on same physical node as patch
> Info: Startup phase 6 took 0.083115 s, 0 MB of memory in use
> Info: Use 3D box decompostion in PME FFT.
> Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
> Info: Startup phase 7 took 0.000105719 s, 0 MB of memory in use
> Info: Updated CUDA force table with 4096 elements.
> Info: Updated CUDA LJ table with 736 x 736 elements.
> Info: Startup phase 8 took 0.00881762 s, 0 MB of memory in use
> Info: Startup phase 9 took 5.543e-05 s, 0 MB of memory in use
> Info: Startup phase 10 took 1.2858e-05 s, 0 MB of memory in use
> Info: Startup phase 11 took 0.00160414 s, 0 MB of memory in use
>
> LDB: Central LB being created...
> Info: Startup phase 12 took 0.000869604 s, 0 MB of memory in use
> Info: CREATING 2024 COMPUTE OBJECTS
> Info: Found 421 unique exclusion lists needing 1512 bytes
> Info: Startup phase 13 took 0.0548679 s, 0 MB of memory in use
> Info: Startup phase 14 took 3.4815e-05 s, 0 MB of memory in use
> Info: Startup phase 15 took 0.000198313 s, 0 MB of memory in use
> Info: Finished startup at 62.8005 s, 0 MB of memory in use
>
> TCL: Running for 75000000 steps
> ETITLE: TS BOND ANGLE DIHED
> IMPRP ELECT VDW BOUNDARY MISC
> KINETIC TOTAL TEMP POTENTIAL TOTAL3
> TEMPAVG PRESSURE GPRESSURE VOLUME
> PRESSAVG GPRESSAVG
>
> ENERGY: 2100000000 9000.2671 39907.6747 29885.8292
> 794.7429 -772113.2446 44539.8570 0.0000 0.0000
> 164962.7556 -483022.1179 309.8727 -647984.8735
> -482186.8765 309.8727 -130.1821 -130.8904
> 2499116.2446 -130.1821 -130.8904
>
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> WRITING EXTENDED SYSTEM TO OUTPUT FILE AT STEP -2119967296
> CLOSING EXTENDED SYSTEM TRAJECTORY FILE
> WRITING COORDINATES TO OUTPUT FILE AT STEP -2119967296
> COORDINATE DCD FILE step7_29production.dcd WAS NOT CREATED
> The last position output (seq=-2) takes 0.010 seconds, 0.000 MB of memory
> in use
> WRITING VELOCITIES TO OUTPUT FILE AT STEP -2119967296
> The last velocity output (seq=-2) takes 0.007 seconds, 0.000 MB of memory
> in use
> ====================================================
>
> WallClock: 63.171822 CPUTime: 62.054176 Memory: 0.000000 MB
> [Partition 0][Node 0] End of program
>
>
>
> ********output file*******************************************
>
>
>
>
>
>
> --
>
> Best,
>
> Rulong Ma
>
>
>

-- 
Best,
Rulong Ma

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST