Re: Is there a problem of ORCA running for NAMD MPI?

From: Marcelo C. R. Melo (melomcr_at_gmail.com)
Date: Tue Nov 20 2018 - 11:21:58 CST

I think there are two issues here:
1) Gerard is absolutely correct in pointing out that NAMD and ORCA should
not be competing for the same cores. Since their execution is in parallel
(NAMD computes MM at the same time that ORCA computes QM), you should
arrange your hardware resources to make sure that they have their own cores
to run in. His is a good example, giving 2 cores for NAMD and 6 for ORCA in
an 8 core machine.

2) Francesco seems to be having a different issue related to MPI. This is
because you cannot (safely) launch an MPI application from within another
MPI application. So if you have the same 8 core machine as in the previous
example, and want to run ORCA with "PAL6" (or PAL4, as in your
QMConfigLine), you need the NAMD-multicore compilation, not the NAMD-MPI
compilation, so that NAMD can be given 2 cores *and* launch an ORCA-MPI
execution at every step. This is an MPI issue that needs to be kept in
mind. Notice that if you run ORCA in one core, this issue does not occur
because ORCA will not make any MPI calls.

At the end of the day, even if this limits the amount of cores you will
give to your MM region, this will not matter much anyway because the HF/DFT
calculation in ORCA (even in parallel) will very likely take much more
time than NAMD takes to calculate the classical part. If you are running
ORCA with PM3, use another software like MOPAC, which will be much faster
for semi-empirical calculations, and can be used with NAMD-MPI since
MOPAC's parallelism uses threads, not MPI.

Hope this helps.
Marcleo

---
Marcelo Cardoso dos Reis Melo
PhD Candidate
Luthey-Schulten Group
University of Illinois at Urbana-Champaign
crdsdsr2_at_illinois.edu
+1 (217) 244-5983
On Tue, 20 Nov 2018 at 09:34, Francesco Pietra <chiendarret_at_gmail.com>
wrote:
> OK, I reported in my previous mail of a few days ago, concerning the same
> simulation on one node, that i could reach orca step 7 by running namd2/12
> qm-mm on a 4core desktop, surely in a matter of a few hours at most,
> although I did not take notice of the time (it was last year). I have now
> killed the simulation, after 8hr.
>
> The final part of the.TmpOut file, after 8hr on 4nodes, or 24hr on one
> node, reads:
>
> Checking for AutoStart:
>> The File:
>> /gpfs/scratch/userexternal/fpietra0/QM-MM/NAMD_Example1_ORCA_24h_100GB_1node/0/qmmm_0.input.gbw
>> exists
>> Trying to determine its content:
>>      ... Fine, the file contains calculation information
>>      ... Fine, the calculation information was read
>>      ... Fine, the file contains a basis set
>>      ... Fine, the basis set was read
>>      ... Fine, the file contains a geometry
>>      ... Fine, the geometry was read
>>      ... The file does not contain orbitals - skipping AutoStart
>
>
> Does that tell you anything?
>
> Thanks a lot for you very useful intervention. Hope that bugs will be
> discovered in the way I set up the simulation. The systems in my project
> are so large that namd can only be run on a cluster.
> frncesco
>
>
> On Tue, Nov 20, 2018 at 3:14 PM Gerard Rowe <GerardR_at_usca.edu> wrote:
>
>> I found a quirk in the way resources get allocated when running QM/MM
>> calculations.  On a single machine with 8 cores, if I launch NAMD with +p8,
>> orca runs extremely slowly during the QM phase because NAMD is still
>> holding onto the resources allocated to it during launch.  When I drop NAMD
>> down to 2 processors and run orca with PAL6, the calculations run much more
>> quickly.  It's important to recognize that Orca is running pretty much
>> independently of NAMD in its own working folder.  If your calculation is
>> taking a very long time to get through one cycle, you can check the .TmpOut
>> file generated in the working directory.
>>
>>
>> You can distinguish between a NAMD and Orca issue by copying the contents
>> of the QM working directory to another location and running Orca directly
>> on the input file.  For a system as small as yours, a single point
>> B3LYP/6-31G shouldn't take 3 hours.
>>
>>
>> Gerard Rowe
>>
>> University of South Carolina Aiken
>> ------------------------------
>> *From:* owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of
>> Francesco Pietra <chiendarret_at_gmail.com>
>> *Sent:* Tuesday, November 20, 2018 4:35:17 AM
>> *To:* NAMD
>> *Subject:* namd-l: Is there a problem of ORCA running for NAMD MPI?
>>
>> Hi
>> On running Example1 tutorial QM-MM, I wonder whether there is a problem
>> with my cluster concerning ORCA running for NAMD MPI: Following failure to
>> proceed beyond
>>
>> TCL: Minimizing for 100 steps
>> Info: List of ranks running QM simulations: 2
>>
>>
>> on one node, 36 tasks, 1 cpu per task, I am trying on four nodes, 144
>> tasks, 1 cpu per task, with little hope, giving the small size of Example1.
>> After 3 hrs, qm is still running. Below the log file . Hope to get some
>> advice on what I am unable to detect.
>> francesco pietra
>> Charm++> Running on MPI version: 3.0
>> Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
>> MPI_THREAD_SINGLE)
>> Charm++> Running in non-SMP mode: numPes 144
>> Charm++> Using recursive bisection (scheme 3) for topology aware
>> partitions
>> Converse/Charm++ Commit ID:
>> v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
>> Warning> Randomization of stack pointer is turned on in kernel, thread
>> migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space'
>> as root to disable it, or try run with '+isomalloc_sync'.
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> Running on 4 unique compute nodes (36-way SMP).
>> Charm++> cpu topology info is gathered in 0.042 seconds.
>> Info: NAMD 2.12 for Linux-x86_64-MPI
>> Info:
>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>> Info: for updates, documentation, and support information.
>> Info:
>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>> Info: in all publications reporting results obtained with NAMD.
>> Info:
>> Info: Based on Charm++/Converse 60701 for mpi-linux-x86_64
>> Info: Built mar 7 mar 2017, 17.38.45, CET by propro01 on node165
>> Info: 1 NAMD  2.12  Linux-x86_64-MPI  144    node419  fpietra0
>> Info: Running on 144 processors, 144 nodes, 4 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.229772 s
>> Info: 695.176 MB of memory in use based on /proc/self/stat
>> Info: Configuration file is namd_ORCA-01.conf
>> Info: Working in the current directory
>> /gpfs/scratch/userexternal/fpietra0/QM-MM/NAMD_Example1_ORCA_24h_4nodes
>> TCL: Suspending until startup complete.
>> Info: SIMULATION PARAMETERS:
>> Info: TIMESTEP               0.5
>> Info: NUMBER OF STEPS        0
>> Info: STEPS PER CYCLE        1
>> Info: PERIODIC CELL BASIS 1  29 0 0
>> Info: PERIODIC CELL BASIS 2  0 34 0
>> Info: PERIODIC CELL BASIS 3  0 0 28
>> Info: PERIODIC CELL CENTER   -0.021 0.008 0.108
>> Info: WRAPPING WATERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
>> Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
>> Info: LOAD BALANCER  Centralized
>> Info: LOAD BALANCING STRATEGY  New Load Balancers -- DEFAULT
>> Info: LDB PERIOD             200 steps
>> Info: FIRST LDB TIMESTEP     5
>> Info: LAST LDB TIMESTEP     -1
>> Info: LDB BACKGROUND SCALING 1
>> Info: HOM BACKGROUND SCALING 1
>> Info: PME BACKGROUND SCALING 1
>> Info: REMOVING LOAD FROM NODE 0
>> Info: REMOVING PATCHES FROM PROCESSOR 0
>> Info: MIN ATOMS PER PATCH    40
>> Info: INITIAL TEMPERATURE    300
>> Info: CENTER OF MASS MOVING INITIALLY? NO
>> Info: DIELECTRIC             1
>> Info: EXCLUDE                SCALED ONE-FOUR
>> Info: 1-4 ELECTROSTATICS SCALED BY 1
>> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
>> Info: DCD FILENAME           PolyAla_out.dcd
>> Info: DCD FREQUENCY          1
>> Info: DCD FIRST STEP         1
>> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
>> Info: XST FILENAME           PolyAla_out.xst
>> Info: XST FREQUENCY          1
>> Info: NO VELOCITY DCD OUTPUT
>> Info: NO FORCE DCD OUTPUT
>> Info: OUTPUT FILENAME        PolyAla_out
>> Info: RESTART FILENAME       PolyAla_out.restart
>> Info: RESTART FREQUENCY      100
>> Info: BINARY RESTART FILES WILL BE USED
>> Info: SWITCHING ACTIVE
>> Info: SWITCHING ON           10
>> Info: SWITCHING OFF          12
>> Info: PAIRLIST DISTANCE      14
>> Info: PAIRLIST SHRINK RATE   0.01
>> Info: PAIRLIST GROW RATE     0.01
>> Info: PAIRLIST TRIGGER       0.3
>> Info: PAIRLISTS PER CYCLE    2
>> Info: PAIRLISTS ENABLED
>> Info: MARGIN                 0.495
>> Info: HYDROGEN GROUP CUTOFF  2.5
>> Info: PATCH DIMENSION        16.995
>> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
>> Info: TIMING OUTPUT STEPS    1
>> Info: PRESSURE OUTPUT STEPS  1
>> Info: QM FORCES ACTIVE
>> Info: QM PDB PARAMETER FILE: PolyAla-qm.pdb
>> Info: QM SOFTWARE: orca
>> Info: QM ATOM CHARGES FROM QM SOFTWARE: MULLIKEN
>> Info: QM EXECUTABLE PATH:
>> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca
>> Info: QM COLUMN: beta
>> Info: QM BOND COLUMN: occ
>> Info: QM WILL DETECT BONDS BETWEEN QM AND MM ATOMS.
>> Info: QM-MM BOND SCHEME: Charge Shift.
>> Info: QM BASE DIRECTORY:
>> /gpfs/scratch/userexternal/fpietra0/QM-MM/NAMD_Example1_ORCA_24h_100GB_1node
>> Info: QM CONFIG LINE: ! B3LYP 6-31G Grid4 PAL4 EnGrad TightSCF
>> Info: QM CONFIG LINE: %%output PrintLevel Mini Print[ P_Mulliken ] 1
>> Print[P_AtCharges_M] 1 end
>> Info: QM POINT CHARGES WILL BE SELECTED EVERY 1 STEPS.
>> Info: QM Point Charge Switching: ON.
>> Info: QM Point Charge SCHEME: none.
>> Info: QM executions per node: 1
>> Info: LANGEVIN DYNAMICS ACTIVE
>> Info: LANGEVIN TEMPERATURE   300
>> Info: LANGEVIN USING BBK INTEGRATOR
>> Info: LANGEVIN DAMPING COEFFICIENT IS 50 INVERSE PS
>> Info: LANGEVIN DYNAMICS APPLIED TO HYDROGENS
>> Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
>> Info:        TARGET PRESSURE IS 1.01325 BAR
>> Info:     OSCILLATION PERIOD IS 200 FS
>> Info:             DECAY TIME IS 100 FS
>> Info:     PISTON TEMPERATURE IS 300 K
>> Info:       PRESSURE CONTROL IS GROUP-BASED
>> Info:    INITIAL STRAIN RATE IS 0 0 0
>> Info:       CELL FLUCTUATION IS ISOTROPIC
>> Info: PARTICLE MESH EWALD (PME) ACTIVE
>> Info: PME TOLERANCE               1e-06
>> Info: PME EWALD COEFFICIENT       0.257952
>> Info: PME INTERPOLATION ORDER     4
>> Info: PME GRID DIMENSIONS         32 36 28
>> Info: PME MAXIMUM GRID SPACING    1
>> Info: Attempting to read FFTW data from system
>> Info: Attempting to read FFTW data from
>> FFTW_NAMD_2.12_Linux-x86_64-MPI_FFTW3.txt
>> Info: Optimizing 6 FFT steps.  1... 2... 3... 4... 5... 6...   Done.
>> Info: Writing FFTW data to FFTW_NAMD_2.12_Linux-x86_64-MPI_FFTW3.txt
>> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY      1
>> Info: USING VERLET I (r-RESPA) MTS SCHEME.
>> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
>> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
>> Info: RANDOM NUMBER SEED     7910881
>> Info: USE HYDROGEN BONDS?    NO
>> Info: COORDINATE PDB         PolyAla.pdb
>> Info: STRUCTURE FILE         PolyAla.psf
>> Info: PARAMETER file: CHARMM format!
>> Info: PARAMETERS             CHARMpars/toppar_all36_carb_glycopeptide.str
>> Info: PARAMETERS             CHARMpars/toppar_water_ions_namd.str
>> Info: PARAMETERS             CHARMpars/toppar_all36_na_nad_ppi_gdp_gtp.str
>> Info: PARAMETERS             CHARMpars/par_all36_carb.prm
>> Info: PARAMETERS             CHARMpars/par_all36_cgenff.prm
>> Info: PARAMETERS             CHARMpars/par_all36_lipid.prm
>> Info: PARAMETERS             CHARMpars/par_all36_na.prm
>> Info: PARAMETERS             CHARMpars/par_all36_prot.prm
>> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
>> Info: SKIPPING rtf SECTION IN STREAM FILE
>> Info: SKIPPING rtf SECTION IN STREAM FILE
>> Info: SKIPPING rtf SECTION IN STREAM FILE
>> Info: SUMMARY OF PARAMETERS:
>> Info: 937 BONDS
>> Info: 2734 ANGLES
>> Info: 6671 DIHEDRAL
>> Info: 203 IMPROPER
>> Info: 6 CROSSTERM
>> Info: 357 VDW
>> Info: 6 VDW_PAIRS
>> Info: 0 NBTHOLE_PAIRS
>> Info: TIME FOR READING PSF FILE: 0.0370231
>> Info: Reading pdb file PolyAla.pdb
>> Info: TIME FOR READING PDB FILE: 0.034543
>> Info:
>> Info: Using the following PDB file for QM parameters: PolyAla-qm.pdb
>> Info: Number of QM atoms (excluding Dummy atoms): 20
>> Info: We found 2 QM-MM bonds.
>> Info: Applying user defined multiplicity 1 to QM group ID 1
>> Info: 1) Group ID: 1 ; Group size: 20 atoms ; Total charge: 0
>> Info: MM-QM pair: 24:30 -> Value (distance or ratio): 1.09 (QM Group 0 ID
>> 1)
>> Info: MM-QM pair: 50:44 -> Value (distance or ratio): 1.09 (QM Group 0 ID
>> 1)
>> Info: ****************************
>> Info: STRUCTURE SUMMARY:
>> Info: 2279 ATOMS
>> Info: 1546 BONDS
>> Info: 879 ANGLES
>> Info: 199 DIHEDRALS
>> Info: 15 IMPROPERS
>> Info: 6 CROSSTERMS
>> Info: 0 EXCLUSIONS
>> Info: 6837 DEGREES OF FREEDOM
>> Info: 773 HYDROGEN GROUPS
>> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
>> Info: 773 MIGRATION GROUPS
>> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
>> Info: TOTAL MASS = 13773.9 amu
>> Info: TOTAL CHARGE = 2.98023e-08 e
>> Info: MASS DENSITY = 0.82848 g/cm^3
>> Info: ATOM DENSITY = 0.0825485 atoms/A^3
>> Info: *****************************
>> Info:
>> Info: Entering startup at 0.70037 s, 799.754 MB of memory in use
>> Info: Startup phase 0 took 0.00322795 s, 799.754 MB of memory in use
>> Info: The QM region will remove 19 bonds, 31 angles, 37 dihedrals, 3
>> impropers and 1 crossterms.
>> Info: ADDED 2624 IMPLICIT EXCLUSIONS
>> Info: Startup phase 1 took 0.709255 s, 799.887 MB of memory in use
>> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
>> Info: NONBONDED TABLE SIZE: 769 POINTS
>> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
>> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000324844 AT 11.9556
>> Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 4.59334e-32 AT 11.9974
>> Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 7.4108e-17 AT 11.9974
>> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 1.53481e-26 AT 11.9974
>> Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 7.96691e-18 AT 11.9974
>> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
>> Info: Startup phase 2 took 0.0194581 s, 804.121 MB of memory in use
>> Info: Startup phase 3 took 0.000361919 s, 804.121 MB of memory in use
>> Info: Startup phase 4 took 0.00718594 s, 804.121 MB of memory in use
>> Info: Startup phase 5 took 0.000344038 s, 804.121 MB of memory in use
>> Info: PATCH GRID IS 3 (PERIODIC) BY 4 (PERIODIC) BY 3 (PERIODIC)
>> Info: PATCH GRID IS 2-AWAY BY 2-AWAY BY 2-AWAY
>> Info: REMOVING COM VELOCITY -0.188499 0.149382 0.0208025
>> Info: LARGEST PATCH (17) HAS 78 ATOMS
>> Info: TORUS A SIZE 144 USING 0 36 72 108
>> Info: TORUS B SIZE 1 USING 0
>> Info: TORUS C SIZE 1 USING 0
>> Info: TORUS MINIMAL MESH SIZE IS 109 BY 1 BY 1
>> Info: Placed 100% of base nodes on same physical node as patch
>> Info: Startup phase 6 took 0.0212991 s, 805.082 MB of memory in use
>> Info: PME using 16 and 18 processors for FFT and reciprocal sum.
>> Info: PME GRID LOCATIONS: 7 15 23 31 43 51 59 67 79 87 ...
>> Info: PME TRANS LOCATIONS: 11 19 27 35 39 47 55 63 71 83 ...
>> Info: PME USING 16 GRID NODES AND 18 TRANS NODES
>> Info: Startup phase 7 took 0.113867 s, 805.75 MB of memory in use
>> Info: Startup phase 8 took 0.00489211 s, 805.75 MB of memory in use
>> LDB: Central LB being created...
>> Info: Startup phase 9 took 0.0102289 s, 805.75 MB of memory in use
>> Info: CREATING 2736 COMPUTE OBJECTS
>> Info: Startup phase 10 took 0.0117202 s, 805.75 MB of memory in use
>> Info: useSync: 1 useProxySync: 0
>> Info: Building spanning tree ... send: 1 recv: 0 with branch factor 4
>> Info: Startup phase 11 took 0.00923896 s, 805.75 MB of memory in use
>> Info: Startup phase 12 took 0.000352859 s, 805.75 MB of memory in use
>> Info: Finished startup at 1.6118 s, 805.75 MB of memory in use
>>
>> TCL: Minimizing for 100 steps
>> Info: List of ranks running QM simulations: 2.
>> ................................
>>
>>
>>
>>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:21 CST