Re: Tuning QM-MM with namd-orca on one cluster node

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Jan 31 2019 - 15:22:25 CST

Hi Marcelo:
Fist thanks.
I moved away from MOPAC as I could not obtain SCF convergence, which was
not unexpected because of the two iron ions. ORCA reached single point
convergence in two runs of 125 iterations each (I was unable to set a flag
for more iterations, "maxiter #" on the qmConfigLine was not accepted and a
perusal of the manual did not help me). I used extensively ORCA years ago
for CD simulation (excited states), but then never more.
As to the size of the system, I am a biochemist, therefore interested in
real systems (which is no justification, I admit) Anyway I used a most
sloppy DFT and convergence in the hope that it is anyway more appropriate
than semiempirical for my system.

I must correct my previous post, as I missed to notice the line
>
> Charm++> cpu affinity enabled.

In new runs, described below, affinity info was complete in namd.log

> Charm++> cpu affinity enabled.
> [1] pthread affinity is: 1
> [3] pthread affinity is: 3
> [4] pthread affinity is: 4
> [2] pthread affinity is: 2
> [0] pthread affinity is: 0

I went before into troubles with PAL# then I (badly) forgot to reactivate
it but, in my hands, such troubles remain. I.e., with either PAL8 or PAL4,
the error, revealed in /0/*TmpOut, was

> There are not enough slots available in the system to satisfy the 4 slots
> that were requested by the application:
> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi

> Either request fewer slots for your application, or make more slots
> available
> for use.

Settings were
qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8)
qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
Print\[P_AtCharges_M\] 1 end"

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=36
#SBATCH --time=00:30:00
module load profile/archive
module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi, the
system complains that mpirun is unavailable and crashes. I must admit to be
confused about that because for a single node mpi should not be requested)

/galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log

It seems that my settings are not providing hardware enough to ORCA despite
the full node of 36 cores.

Thanks for advice

francesco

On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
wrote:

> Hi Francesco,
>
> The first line in your namd.log says
> "Info: Running on 5 processors, 1 nodes, 1 physical nodes."
> Which indicates NAMD is indeed using the 5 cores you requested with "+p5".
> Some times "top" will show just one process, but the CPU usage of the
> process will show 500%, for example, indicating 5 cores. This happens in
> some cluster management systems too.
>
> As for ORCA, your "qm config line" does not indicate you are requesting it
> to use multiple cores, so it most likely is really using just one. You
> should be using the keyword "PAL?", where the question mark indicates the
> number of requested cores: use "PAL8", for example, to ask for 8 cores.
> You should become familiarized with the commands that control how ORCA
> distributes its computations in a cluster (their manual is very good), as
> you may need to provide a "hostfile" indicating the name(s) of the node(s)
> where ORCA will find available processors. This is something every cluster
> makes available when the queuing system reserves nodes for a job, so you
> should find out how to access that in your cluster.
>
> As a final note, even in parallel, calculating 341 QM atoms (QM system +
> link atoms) using DFT will be slow. Really slow. Maybe not 10 hours per
> timestep, but you just went from a medium sized semi-empirical (parallel
> MOPAC) calculation to large DFT one. Even in parallel, MOPAC could take a
> couple of seconds per timestep (depending on CPU power). ORCA/DFT will take
> much more than that.
>
> Best,
> Marcelo
> ---
> Marcelo Cardoso dos Reis Melo, PhD
> Postdoctoral Research Associate
> Luthey-Schulten Group
> University of Illinois at Urbana-Champaign
> crdsdsr2_at_illinois.edu
> +1 (217) 244-5983
>
>
> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> Hello
>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on one
>> cluster node on my system (large qm part, see below, including two iron
>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same
>> cluster (36 cores along two sockets). So far I was unable to have namd and
>> orca running on more than one core each.
>>
>> namd.conf
>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad SlowConv"
>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>> Print\[P_AtCharges_M\] 1 end"
>> (SCF already converged by omitting "enGrad")
>>
>> namd.job
>> #SBATCH --nodes=1
>> #SBATCH --ntasks=1
>> #SBATCH --cpus-per-task=36
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log
>>
>> namd.log
>> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
>> Info: Number of QM atoms (excluding Dummy atoms): 315
>> Info: We found 26 QM-MM bonds.
>> Info: Applying user defined multiplicity 1 to QM group ID 1
>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1
>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF
>> charge.
>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM Group 0
>> ID 1)
>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM Group
>> 0 ID 1)
>> TCL: Minimizing for 200 steps
>> Info: List of ranks running QM simulations: 0.
>> Nothing about affinity!! (which was clearly displayed in MOPAC case)
>>
>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS
>>
>> "top" shown a single PR for both namd and orca.
>> ___-
>> I had already tried a different job setting
>> #SBATCH --nodes=1
>> #SBATCH --ntasks-per-node=4
>> #SBATCH --ntasks-per-socket=2
>> module load profile/archive
>> module load autoload openmpi/2.1.1--gnu--6.1.0
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>> namd-01.conf +p5 > namd-01.log
>>
>> Here too, "top" showed a single PR for both namd and orca, so that in
>> about 20 hous, namd.log was at "ENERGY 2", indicating that 1400 hrs were
>> needed to complete the simulation.
>>
>> Thanks for advice
>> francesco pietra
>>
>>
>>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:10 CST