REMD Problem

From: Anup Prasad (anup.prasad_at_monash.edu)
Date: Wed Sep 23 2020 - 05:47:08 CDT

I am using the NAMD platform for my MD simulations. I want to use the CPU
nodes ( 1 node has 40 core ) on the HPC facility here (CRAY XC) to run my
simulations, for which I am trying to run the REMD (replica = 8) of the
given example at https://www.ks.uiuc.edu/Research/namd/2.9/ug/node66.html.
I used the following script but gives very poor performance and it does not
seem to share the replicas. I split my job among 5 processors but it does
not share. Kindly help me to get the appropriate submitting script for
using the entire resource of my cluster and replicas sharing.

*HPC specifications*

*Operating System -- Cray Linux Environment Version - 6.x*

*Cray Programming Environment (CPE) -- Unlimited*

*Intel Parallel Studio XE -- 5 Seats*

*PGI Accelerator -- 2 Seats*

*Workload Manager -- PBS Pro*

*Compute Node - CPU only Node*

*Number of Nodes -- 212 Regular CPU Nodes + 4 High Memory NodesProcessor --
2X Intel Skylake 6148 2.4 GHz 20CMemory Per Node -- 192 GB DDR4-2666 with
ChipkillTM technologyMemory Per Node (High Memory Node) -- 1536 GB
DDR4-2666 with Chipkill technology*

*This is the shell script I use to submit jobs,*

*##############################################################################*

* submitting shell script*

*##############################################################################*
#PBS -N REMD_test
#PBS -q devel
#PBS -l select=1:ncpus=40:vntype=cray_compute
#PBS -l walltime=00:05:00
#PBS -l place=pack
#PBS -j oe

module load namd/2.12/intel-18.0.1

cd $PBS_O_WORKDIR
aprun -n 8 -N 8 -d 5 /home/apps/namd/2.12/intel/18.0.1/CRAY-XC-intel/namd2
+replicas 8 job0.conf +stdout output/%d/job0.%d.log

*##############################################################################*
*OUTPUT: The log file*
*##############################################################################*

Converse/Charm++ Commit ID:
v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (80-way SMP).
Info: NAMD 2.12 for CRAY-XC-MPI
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60701 for mpi-crayxc
Info: Built Thu Aug 30 22:48:00 IST 2018 by crayadm on clogin72
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0030148 s
Info: 122.062 MB of memory in use based on /proc/self/stat
Info: Configuration file is job0.conf
Info: Working in the current directory
/home/PolymerSimulationLab/anup.prasad/replica/example
TCL: Reduction callback proc set to save_callback
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 100
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 20000 steps
Info: FIRST LDB TIMESTEP 500
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: MIN ATOMS PER PATCH 40
Info: INITIAL TEMPERATURE 300
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 0.4
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME output/0/fold_alanin.job0.0.dcd
Info: DCD FREQUENCY 10000
Info: DCD FIRST STEP 10000
Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME output/0/fold_alanin.job0.0
Info: BINARY OUTPUT FILES WILL BE USED
Info: NO RESTART FILE
Info: SWITCHING ACTIVE
Info: SWITCHING ON 7
Info: SWITCHING OFF 8
Info: PAIRLIST DISTANCE 10
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 10
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 22.5
Info: ENERGY OUTPUT STEPS 100
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 1000
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 300
Info: LANGEVIN USING BBK INTEGRATOR
Info: LANGEVIN DAMPING COEFFICIENT IS 10 INVERSE PS
Info: LANGEVIN DYNAMICS APPLIED TO HYDROGENS
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RANDOM NUMBER SEED 45330
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB unfolded.pdb
Info: STRUCTURE FILE alanin.psf
Info: PARAMETER file: XPLOR format! (default)
Info: PARAMETERS alanin.params
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SUMMARY OF PARAMETERS:
Info: 61 BONDS
Info: 179 ANGLES
Info: 38 DIHEDRAL
Info: 42 IMPROPER
Info: 0 CROSSTERM
Info: 21 VDW
Info: 0 VDW_PAIRS
Info: 0 NBTHOLE_PAIRS
Info: TIME FOR READING PSF FILE: 0.010375
Info: Reading pdb file unfolded.pdb
Info: TIME FOR READING PDB FILE: 0.014677
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 66 ATOMS
Info: 65 BONDS
Info: 96 ANGLES
Info: 31 DIHEDRALS
Info: 32 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 198 DEGREES OF FREEDOM
Info: 55 HYDROGEN GROUPS
Info: 2 ATOMS IN LARGEST HYDROGEN GROUP
Info: 55 MIGRATION GROUPS
Info: 2 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 783.886 amu
Info: TOTAL CHARGE = 8.19564e-08 e
Info: *****************************
Info:
Info: Entering startup at 0.144568 s, 137.074 MB of memory in use
Info: Startup phase 0 took 6.29425e-05 s, 137.074 MB of memory in use
Info: ADDED 285 IMPLICIT EXCLUSIONS
Info: Startup phase 1 took 0.000128984 s, 137.074 MB of memory in use
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 705 POINTS
Info: ABSOLUTE IMPRECISION IN FAST TABLE ENERGY: 3.38813e-21 AT 7.99609
Info: RELATIVE IMPRECISION IN FAST TABLE ENERGY: 1.27241e-16 AT 7.99609
Info: ABSOLUTE IMPRECISION IN FAST TABLE FORCE: 6.77626e-21 AT 7.99609
Info: RELATIVE IMPRECISION IN FAST TABLE FORCE: 1.1972e-16 AT 7.99609
Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000290023 AT 0.251946
Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00563612 AT 7.01338
Info: Startup phase 2 took 0.000255108 s, 137.074 MB of memory in use
Info: Startup phase 3 took 3.88622e-05 s, 137.074 MB of memory in use
Info: Startup phase 4 took 3.50475e-05 s, 137.074 MB of memory in use
Info: Startup phase 5 took 3.50475e-05 s, 137.074 MB of memory in use
Info: ORIGINAL ATOMS MINMAX IS -0.225 0.403 -3.892 12.577 10.072 11.917
Info: ADJUSTED ATOMS MINMAX IS -0.064 1.044 -4.13837 12.106 9.464 11.1954
Info: PATCH GRID IS 1 BY 1 BY 1
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY 0.40329 -0.784504 -0.237392
Info: LARGEST PATCH (0) HAS 66 ATOMS
Info: TORUS A SIZE 1 USING 0
Info: TORUS B SIZE 1 USING 0
Info: TORUS C SIZE 1 USING 0
Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
Info: Placed 100% of base nodes on same physical node as patch
Info: Startup phase 6 took 0.00018692 s, 137.074 MB of memory in use
Info: Startup phase 7 took 4.31538e-05 s, 137.074 MB of memory in use
Info: Startup phase 8 took 3.60012e-05 s, 137.074 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 9 took 3.98159e-05 s, 137.074 MB of memory in use
Info: CREATING 11 COMPUTE OBJECTS
Info: Startup phase 10 took 0.000169039 s, 137.074 MB of memory in use
Info: Startup phase 11 took 4.91142e-05 s, 137.074 MB of memory in use
Info: Startup phase 12 took 2.88486e-05 s, 137.074 MB of memory in use
Info: Finished startup at 0.145677 s, 137.074 MB of memory in use

TCL: Running for 1000 steps
ETITLE: TS BOND ANGLE DIHED IMPRP
              ELECT VDW BOUNDARY MISC
 KINETIC TOTAL TEMP POTENTIAL TOTAL3
     TEMPAVG

ENERGY: 0 44.6127 52.1232 8.6023 20.3617
          -191.4874 6.6678 0.0000 0.0000
 56.0338 -3.0859 284.8234 -59.1198 -2.6304
    284.8234

LDB: ============= START OF LOAD BALANCING ============== 0.149812
LDB: ============== END OF LOAD BALANCING =============== 0.149838
LDB: =============== DONE WITH MIGRATION ================ 0.149915
ENERGY: 100 25.4220 27.3112 9.6089 16.3712
          -184.0543 -7.0893 0.0000 0.0000
 61.1351 -51.2951 310.7532 -112.4302 -51.3369
    406.3113

ENERGY: 200 16.9458 30.3913 7.3391 8.6401
          -183.6067 -6.5710 0.0000 0.0000
 59.7920 -67.0695 303.9261 -126.8614 -67.3960
    316.7936

ENERGY: 300 13.7491 22.3623 6.1732 8.8093
          -187.3883 -5.2571 0.0000 0.0000
 57.6206 -83.9311 292.8887 -141.5517 -84.2299
    273.2067

ENERGY: 400 21.2640 20.9768 4.8943 9.7980
          -180.4271 -5.6044 0.0000 0.0000
 46.0176 -83.0808 233.9100 -129.0984 -83.3364
    282.4105

LDB: ============= START OF LOAD BALANCING ============== 0.170008
LDB: ============== END OF LOAD BALANCING =============== 0.170024
LDB: =============== DONE WITH MIGRATION ================ 0.170139
Info: Initial time: 1 CPUs 4.80666e-05 s/step 0.000556327 days/ns 137.074
MB memory
ENERGY: 500 20.9882 28.2117 5.2205 4.4678
          -186.8598 -7.4956 0.0000 0.0000
 57.1447 -78.3224 290.4697 -135.4671 -77.6672
    281.7287

LDB: ============= START OF LOAD BALANCING ============== 0.174067
LDB: ============== END OF LOAD BALANCING =============== 0.174084
LDB: =============== DONE WITH MIGRATION ================ 0.174143
ENERGY: 600 18.1510 24.6177 4.7830 5.4573
          -188.3651 -10.5077 0.0000 0.0000
 53.7618 -92.1019 273.2745 -145.8638 -91.8306
    275.7984

ENERGY: 700 20.5458 24.3071 3.0372 7.5664
          -185.8466 -5.3999 0.0000 0.0000
 51.6276 -84.1624 262.4260 -135.7899 -85.3503
    263.4592

ENERGY: 800 18.7512 23.4445 4.3903 11.5937
          -183.6307 -7.1014 0.0000 0.0000
 55.7138 -76.8385 283.1968 -132.5524 -76.3070
    290.4284

ENERGY: 900 17.7769 23.0947 6.4763 9.8487
          -188.0586 3.4246 0.0000 0.0000
 60.2771 -67.1603 306.3919 -127.4374 -66.7688
    310.5239

*Please help with suggestions.*

*Kind regards*

*Anup Kumar Prasad*

*Ph.D scholar, IITB-Monash Research Academy*

*Indian Institute of Technology Bombay, INDIA*

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:09 CST