Re: Tuning QM-MM with namd-orca on one cluster node

From: Marcelo C. R. Melo (melomcr_at_gmail.com)
Date: Thu Jan 31 2019 - 16:41:35 CST

Next message: Francesco Pietra: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"
Previous message: Jim Phillips: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"
In reply to: Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"
Next in thread: Francesco Pietra: "Re: Tuning QM-MM with namd-orca on one cluster node"
Reply: Francesco Pietra: "Re: Tuning QM-MM with namd-orca on one cluster node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

They are referring to tasks, not nodes. One could request 8 tasks in a
4-core multi-threaded system, for example. Makes sense? (Though that would
not be advisable in your case).

As I mentioned in my previous e-mail, you should check the commands that
control how ORCA distributes its computations in a cluster, as you may need
to provide a "hostfile" indicating the name(s) of the node(s) where ORCA
will find available processors. This is something every cluster makes
available when the queuing system reserves nodes for a job, so you should
find out how to access that in your cluster.

I imagine the cluster's MPI system is not making the tasks available when
ORCA calls MPI.
And yes, ORCA will use MPI even to parallelize within a single node.

Best

---
Marcelo Cardoso dos Reis Melo, PhD
Postdoctoral Research Associate
Luthey-Schulten Group
University of Illinois at Urbana-Champaign
crdsdsr2_at_illinois.edu
+1 (217) 244-5983
On Thu, 31 Jan 2019 at 15:30, Francesco Pietra <chiendarret_at_gmail.com>
wrote:
> Are PAL4 and PAL8 expecting four or eight nodes, respectively, rather than
> cores?
>
> ---------- Forwarded message ---------
> From: Francesco Pietra <chiendarret_at_gmail.com>
> Date: Thu, Jan 31, 2019 at 10:22 PM
> Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node
> To: Marcelo C. R. Melo <melomcr_at_gmail.com>
> Cc: NAMD <namd-l_at_ks.uiuc.edu>
>
>
> Hi Marcelo:
> Fist thanks.
> I moved away from MOPAC as I could not obtain SCF convergence, which was
> not unexpected because of the two iron ions. ORCA reached single point
> convergence in two runs of 125 iterations each (I was unable to set a flag
> for more iterations, "maxiter #" on the qmConfigLine was not accepted and a
> perusal of the manual did not help me). I used extensively ORCA years ago
> for CD simulation (excited states), but then never more.
> As to the size of the system, I am a biochemist, therefore interested in
> real systems (which is no justification, I admit) Anyway I used a most
> sloppy DFT and convergence in the hope that it is anyway more appropriate
> than semiempirical for my system.
>
> I must correct my previous post, as I missed to notice the line
>>
>> Charm++> cpu affinity enabled.
>
> In new runs, described below, affinity info was complete in namd.log
>
>> Charm++> cpu affinity enabled.
>> [1] pthread affinity is:  1
>> [3] pthread affinity is:  3
>> [4] pthread affinity is:  4
>> [2] pthread affinity is:  2
>> [0] pthread affinity is:  0
>
>
> I went before into troubles with PAL# then I (badly) forgot to reactivate
> it but, in my hands, such troubles remain. I.e., with either PAL8 or PAL4,
> the error, revealed in /0/*TmpOut, was
>
>>  There are not enough slots available in the system to satisfy the 4 slots
>> that were requested by the application:
>>   /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi
>
>
>> Either request fewer slots for your application, or make more slots
>> available
>> for use.
>
>
> Settings were
> qmConfigLine    "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8)
> qmConfigLine    "%%output Printlevel Mini Print\[ P_Mulliken \] 1
> Print\[P_AtCharges_M\] 1 end"
>
>
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=36
> #SBATCH --time=00:30:00
> module load profile/archive
> module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi,
> the system complains that mpirun is unavailable and crashes. I must admit
> to be confused about that because for a single node mpi should not be
> requested)
>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
>
> It seems that my settings are not providing hardware enough to ORCA
> despite the full node of 36 cores.
>
> Thanks for advice
>
> francesco
>
>
>
> On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
> wrote:
>
>> Hi Francesco,
>>
>> The first line in your namd.log says
>> "Info: Running on 5 processors, 1 nodes, 1 physical nodes."
>> Which indicates NAMD is indeed using the 5 cores you requested with
>> "+p5". Some times "top" will show just one process, but the CPU usage of
>> the process will show 500%, for example, indicating 5 cores. This happens
>> in some cluster management systems too.
>>
>> As for ORCA, your "qm config line" does not indicate you are requesting
>> it to use multiple cores, so it most likely is really using just one. You
>> should be using the keyword "PAL?", where the question mark indicates the
>> number of requested cores: use "PAL8", for example, to ask for 8 cores.
>> You should become familiarized with the commands that control how ORCA
>> distributes its computations in a cluster (their manual is very good), as
>> you may need to provide a "hostfile" indicating the name(s) of the node(s)
>> where ORCA will find available processors. This is something every cluster
>> makes available when the queuing system reserves nodes for a job, so you
>> should find out how to access that in your cluster.
>>
>> As a final note, even in parallel, calculating 341 QM atoms (QM system +
>> link atoms) using DFT will be slow. Really slow. Maybe not 10 hours per
>> timestep, but you just went from a medium sized semi-empirical (parallel
>> MOPAC) calculation to large DFT one. Even in parallel, MOPAC could take a
>> couple of seconds per timestep (depending on CPU power). ORCA/DFT will take
>> much more than that.
>>
>> Best,
>> Marcelo
>> ---
>> Marcelo Cardoso dos Reis Melo, PhD
>> Postdoctoral Research Associate
>> Luthey-Schulten Group
>> University of Illinois at Urbana-Champaign
>> crdsdsr2_at_illinois.edu
>> +1 (217) 244-5983
>>
>>
>> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com>
>> wrote:
>>
>>> Hello
>>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on one
>>> cluster node on my system (large qm part, see below, including two iron
>>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same
>>> cluster (36 cores along two sockets). So far I was unable to have namd and
>>> orca running on more than one core each.
>>>
>>> namd.conf
>>> qmConfigLine    "! UKS BP86 RI SV def2/J enGrad SlowConv"
>>> qmConfigLine    "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>>> Print\[P_AtCharges_M\] 1 end"
>>> (SCF already converged by omitting "enGrad")
>>>
>>> namd.job
>>> #SBATCH --nodes=1
>>> #SBATCH --ntasks=1
>>> #SBATCH --cpus-per-task=36
>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log
>>>
>>> namd.log
>>> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
>>> Info: Number of QM atoms (excluding Dummy atoms): 315
>>> Info: We found 26 QM-MM bonds.
>>> Info: Applying user defined multiplicity 1 to QM group ID 1
>>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1
>>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF
>>> charge.
>>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM Group 0
>>> ID 1)
>>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM Group
>>> 0 ID 1)
>>> TCL: Minimizing for 200 steps
>>> Info: List of ranks running QM simulations: 0.
>>> Nothing about affinity!! (which was clearly displayed in MOPAC case)
>>>
>>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS
>>>
>>> "top" shown a single PR for both namd and orca.
>>> ___-
>>> I had already tried a different job setting
>>> #SBATCH --nodes=1
>>> #SBATCH --ntasks-per-node=4
>>> #SBATCH --ntasks-per-socket=2
>>> module load profile/archive
>>> module load autoload openmpi/2.1.1--gnu--6.1.0
>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>> namd-01.conf +p5 > namd-01.log
>>>
>>> Here too, "top" showed a single PR for both namd and orca, so that in
>>> about 20 hous, namd.log  was at "ENERGY 2", indicating that 1400 hrs were
>>> needed to complete the simulation.
>>>
>>> Thanks for advice
>>> francesco pietra
>>>
>>>
>>>

Next message: Francesco Pietra: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"
Previous message: Jim Phillips: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"
In reply to: Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"
Next in thread: Francesco Pietra: "Re: Tuning QM-MM with namd-orca on one cluster node"
Reply: Francesco Pietra: "Re: Tuning QM-MM with namd-orca on one cluster node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:10 CST