Re: Tuning QM-MM with namd-orca on one cluster node

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Feb 01 2019 - 09:24:44 CST

Hi Marcelo
Now that the !PAL directive has been fixed, I looked for (unsuccessfully)
how to implement the %pal directive

!BP86 ...
%pal
nproc 34
end

 in place of the !PAL directive in

 qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL8 SlowConv"
qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
Print\[P_AtCharges_M\] 1 end"

in order to go beyond the PAL8 limit and exploit all core resources of the
node (while also allowing to set multinode)

thanks for advice
francesco
PS As far as I can understand, with "my" cluster, requesting nearly all
memory (as I did) provides all resources exclusively to me for the node
that is allocated.

On Thu, Jan 31, 2019 at 11:42 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
wrote:

> They are referring to tasks, not nodes. One could request 8 tasks in a
> 4-core multi-threaded system, for example. Makes sense? (Though that would
> not be advisable in your case).
>
> As I mentioned in my previous e-mail, you should check the commands that
> control how ORCA distributes its computations in a cluster, as you may need
> to provide a "hostfile" indicating the name(s) of the node(s) where ORCA
> will find available processors. This is something every cluster makes
> available when the queuing system reserves nodes for a job, so you should
> find out how to access that in your cluster.
>
> I imagine the cluster's MPI system is not making the tasks available when
> ORCA calls MPI.
> And yes, ORCA will use MPI even to parallelize within a single node.
>
> Best
> ---
> Marcelo Cardoso dos Reis Melo, PhD
> Postdoctoral Research Associate
> Luthey-Schulten Group
> University of Illinois at Urbana-Champaign
> crdsdsr2_at_illinois.edu
> +1 (217) 244-5983
>
>
> On Thu, 31 Jan 2019 at 15:30, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> Are PAL4 and PAL8 expecting four or eight nodes, respectively, rather
>> than cores?
>>
>> ---------- Forwarded message ---------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Thu, Jan 31, 2019 at 10:22 PM
>> Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node
>> To: Marcelo C. R. Melo <melomcr_at_gmail.com>
>> Cc: NAMD <namd-l_at_ks.uiuc.edu>
>>
>>
>> Hi Marcelo:
>> Fist thanks.
>> I moved away from MOPAC as I could not obtain SCF convergence, which was
>> not unexpected because of the two iron ions. ORCA reached single point
>> convergence in two runs of 125 iterations each (I was unable to set a flag
>> for more iterations, "maxiter #" on the qmConfigLine was not accepted and a
>> perusal of the manual did not help me). I used extensively ORCA years ago
>> for CD simulation (excited states), but then never more.
>> As to the size of the system, I am a biochemist, therefore interested in
>> real systems (which is no justification, I admit) Anyway I used a most
>> sloppy DFT and convergence in the hope that it is anyway more appropriate
>> than semiempirical for my system.
>>
>> I must correct my previous post, as I missed to notice the line
>>>
>>> Charm++> cpu affinity enabled.
>>
>> In new runs, described below, affinity info was complete in namd.log
>>
>>> Charm++> cpu affinity enabled.
>>> [1] pthread affinity is: 1
>>> [3] pthread affinity is: 3
>>> [4] pthread affinity is: 4
>>> [2] pthread affinity is: 2
>>> [0] pthread affinity is: 0
>>
>>
>> I went before into troubles with PAL# then I (badly) forgot to reactivate
>> it but, in my hands, such troubles remain. I.e., with either PAL8 or PAL4,
>> the error, revealed in /0/*TmpOut, was
>>
>>> There are not enough slots available in the system to satisfy the 4
>>> slots
>>> that were requested by the application:
>>> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi
>>
>>
>>> Either request fewer slots for your application, or make more slots
>>> available
>>> for use.
>>
>>
>> Settings were
>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8)
>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>> Print\[P_AtCharges_M\] 1 end"
>>
>>
>> #SBATCH --nodes=1
>> #SBATCH --ntasks=1
>> #SBATCH --cpus-per-task=36
>> #SBATCH --time=00:30:00
>> module load profile/archive
>> module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi,
>> the system complains that mpirun is unavailable and crashes. I must admit
>> to be confused about that because for a single node mpi should not be
>> requested)
>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
>>
>> It seems that my settings are not providing hardware enough to ORCA
>> despite the full node of 36 cores.
>>
>> Thanks for advice
>>
>> francesco
>>
>>
>>
>> On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
>> wrote:
>>
>>> Hi Francesco,
>>>
>>> The first line in your namd.log says
>>> "Info: Running on 5 processors, 1 nodes, 1 physical nodes."
>>> Which indicates NAMD is indeed using the 5 cores you requested with
>>> "+p5". Some times "top" will show just one process, but the CPU usage of
>>> the process will show 500%, for example, indicating 5 cores. This happens
>>> in some cluster management systems too.
>>>
>>> As for ORCA, your "qm config line" does not indicate you are requesting
>>> it to use multiple cores, so it most likely is really using just one. You
>>> should be using the keyword "PAL?", where the question mark indicates the
>>> number of requested cores: use "PAL8", for example, to ask for 8 cores.
>>> You should become familiarized with the commands that control how ORCA
>>> distributes its computations in a cluster (their manual is very good), as
>>> you may need to provide a "hostfile" indicating the name(s) of the node(s)
>>> where ORCA will find available processors. This is something every cluster
>>> makes available when the queuing system reserves nodes for a job, so you
>>> should find out how to access that in your cluster.
>>>
>>> As a final note, even in parallel, calculating 341 QM atoms (QM system +
>>> link atoms) using DFT will be slow. Really slow. Maybe not 10 hours per
>>> timestep, but you just went from a medium sized semi-empirical (parallel
>>> MOPAC) calculation to large DFT one. Even in parallel, MOPAC could take a
>>> couple of seconds per timestep (depending on CPU power). ORCA/DFT will take
>>> much more than that.
>>>
>>> Best,
>>> Marcelo
>>> ---
>>> Marcelo Cardoso dos Reis Melo, PhD
>>> Postdoctoral Research Associate
>>> Luthey-Schulten Group
>>> University of Illinois at Urbana-Champaign
>>> crdsdsr2_at_illinois.edu
>>> +1 (217) 244-5983
>>>
>>>
>>> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on one
>>>> cluster node on my system (large qm part, see below, including two iron
>>>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same
>>>> cluster (36 cores along two sockets). So far I was unable to have namd and
>>>> orca running on more than one core each.
>>>>
>>>> namd.conf
>>>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad SlowConv"
>>>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>>>> Print\[P_AtCharges_M\] 1 end"
>>>> (SCF already converged by omitting "enGrad")
>>>>
>>>> namd.job
>>>> #SBATCH --nodes=1
>>>> #SBATCH --ntasks=1
>>>> #SBATCH --cpus-per-task=36
>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log
>>>>
>>>> namd.log
>>>> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
>>>> Info: Number of QM atoms (excluding Dummy atoms): 315
>>>> Info: We found 26 QM-MM bonds.
>>>> Info: Applying user defined multiplicity 1 to QM group ID 1
>>>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1
>>>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF
>>>> charge.
>>>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM Group
>>>> 0 ID 1)
>>>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM
>>>> Group 0 ID 1)
>>>> TCL: Minimizing for 200 steps
>>>> Info: List of ranks running QM simulations: 0.
>>>> Nothing about affinity!! (which was clearly displayed in MOPAC case)
>>>>
>>>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS
>>>>
>>>> "top" shown a single PR for both namd and orca.
>>>> ___-
>>>> I had already tried a different job setting
>>>> #SBATCH --nodes=1
>>>> #SBATCH --ntasks-per-node=4
>>>> #SBATCH --ntasks-per-socket=2
>>>> module load profile/archive
>>>> module load autoload openmpi/2.1.1--gnu--6.1.0
>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>> namd-01.conf +p5 > namd-01.log
>>>>
>>>> Here too, "top" showed a single PR for both namd and orca, so that in
>>>> about 20 hous, namd.log was at "ENERGY 2", indicating that 1400 hrs were
>>>> needed to complete the simulation.
>>>>
>>>> Thanks for advice
>>>> francesco pietra
>>>>
>>>>
>>>>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:28 CST