Re: Tuning QM-MM with namd-orca on one cluster node

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Mon Feb 04 2019 - 01:54:31 CST

Hi Marcelo:
Deleting (obviously) "PAL8" from the first qmConfigLine, /0/*TmpOut shows
"Program running on 34 parallel MPI processes", in accordance with "top"
command. Great! however poor scaling: 63 SCF iterations in 30 min.
Marginally better than with 8 parallel MPI processes (PAL8): 60 SCF
iterations in 30 min.

Therefore, it seems to me that there is no point to engage more than one
node.

On these prospects I doubt that enGrad SCF convergence will be reached in
the allowed 24hr at the cluster. Yes, your warning about so many qm atoms
was not out of place. No fewer qm atoms could be set anyway.

What else I could try is more extensively investigating the low-spin vs
high-spin state of the two iron ions (on the basis of published Mossbauer
experiments). Probably, better models are also needed for iron ions than
currently provided by CHARMM36.

Thanks a lot
francesco

On Mon, Feb 4, 2019 at 12:04 AM Marcelo C. R. Melo <melomcr_at_gmail.com>
wrote:

> Hi Francesco,
>
> NAMD's input file is read and interpreted in a TCL standard, that is why,
> for example, you can "set" a variable, or make "if" statements.
> In your case, and in any case that involves defining strings or texts, you
> need to be aware of "escape" characters, like the one for the percent sign.
> You can see that in action in the second qmConfigLine, the one that
> requests output for Mulliken atomic charges: there are two percent signs at
> the beginning of the string, however, NAMD only writes one percent sign in
> the input file for ORCA. You will notice the backslashes escaping the
> brackets as well.
>
> I think your extended parallelism "block" should look lithe the following
> in NAMD's config file:
>
> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL8 SlowConv"
> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
> Print\[P_AtCharges_M\] 1 end"
> qmConfigLine "%%pal nproc 34 end"
>
> or maybe (if you want the multiple lines)
>
> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL8 SlowConv"
> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
> Print\[P_AtCharges_M\] 1 end"
> qmConfigLine "%%pal"
> qmConfigLine "nproc 34"
> qmConfigLine "end"
>
> Let me know how that goes,
> Marcelo
> ---
> Marcelo Cardoso dos Reis Melo, PhD
> Postdoctoral Research Associate
> Luthey-Schulten Group
> University of Illinois at Urbana-Champaign
> crdsdsr2_at_illinois.edu
> +1 (217) 244-5983
>
>
> On Sat, 2 Feb 2019 at 03:07, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> To be specific, in my attempts, "%" is interpreted as "0x49b8ef0al" in
>> the qmConfigLine (or externally when I try that the qmConfigLine reads an
>> external
>>
>> ! KKS BP86 RI SV def2/J enGrad SlowConv
>> %pal
>> nproc 34
>> end
>>
>> I tried all symbols that should be accepted in the qmCongigLine: !, [, %,
>> *, $, so, I can't figure out how the provided namd.conf could be adapted to
>> a cluster.
>>
>> all the best
>> francesco
>>
>> ---------- Forwarded message ---------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Fri, Feb 1, 2019 at 4:24 PM
>> Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node
>> To: Marcelo C. R. Melo <melomcr_at_gmail.com>
>> Cc: NAMD <namd-l_at_ks.uiuc.edu>
>>
>>
>> Hi Marcelo
>> Now that the !PAL directive has been fixed, I looked for (unsuccessfully)
>> how to implement the %pal directive
>>
>> !BP86 ...
>> %pal
>> nproc 34
>> end
>>
>> in place of the !PAL directive in
>>
>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL8 SlowConv"
>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>> Print\[P_AtCharges_M\] 1 end"
>>
>> in order to go beyond the PAL8 limit and exploit all core resources of
>> the node (while also allowing to set multinode)
>>
>> thanks for advice
>> francesco
>> PS As far as I can understand, with "my" cluster, requesting nearly all
>> memory (as I did) provides all resources exclusively to me for the node
>> that is allocated.
>>
>>
>>
>> On Thu, Jan 31, 2019 at 11:42 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
>> wrote:
>>
>>> They are referring to tasks, not nodes. One could request 8 tasks in a
>>> 4-core multi-threaded system, for example. Makes sense? (Though that would
>>> not be advisable in your case).
>>>
>>> As I mentioned in my previous e-mail, you should check the commands that
>>> control how ORCA distributes its computations in a cluster, as you may need
>>> to provide a "hostfile" indicating the name(s) of the node(s) where ORCA
>>> will find available processors. This is something every cluster makes
>>> available when the queuing system reserves nodes for a job, so you should
>>> find out how to access that in your cluster.
>>>
>>> I imagine the cluster's MPI system is not making the tasks available
>>> when ORCA calls MPI.
>>> And yes, ORCA will use MPI even to parallelize within a single node.
>>>
>>> Best
>>> ---
>>> Marcelo Cardoso dos Reis Melo, PhD
>>> Postdoctoral Research Associate
>>> Luthey-Schulten Group
>>> University of Illinois at Urbana-Champaign
>>> crdsdsr2_at_illinois.edu
>>> +1 (217) 244-5983
>>>
>>>
>>> On Thu, 31 Jan 2019 at 15:30, Francesco Pietra <chiendarret_at_gmail.com>
>>> wrote:
>>>
>>>> Are PAL4 and PAL8 expecting four or eight nodes, respectively, rather
>>>> than cores?
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Francesco Pietra <chiendarret_at_gmail.com>
>>>> Date: Thu, Jan 31, 2019 at 10:22 PM
>>>> Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node
>>>> To: Marcelo C. R. Melo <melomcr_at_gmail.com>
>>>> Cc: NAMD <namd-l_at_ks.uiuc.edu>
>>>>
>>>>
>>>> Hi Marcelo:
>>>> Fist thanks.
>>>> I moved away from MOPAC as I could not obtain SCF convergence, which
>>>> was not unexpected because of the two iron ions. ORCA reached single point
>>>> convergence in two runs of 125 iterations each (I was unable to set a flag
>>>> for more iterations, "maxiter #" on the qmConfigLine was not accepted and a
>>>> perusal of the manual did not help me). I used extensively ORCA years ago
>>>> for CD simulation (excited states), but then never more.
>>>> As to the size of the system, I am a biochemist, therefore interested
>>>> in real systems (which is no justification, I admit) Anyway I used a most
>>>> sloppy DFT and convergence in the hope that it is anyway more appropriate
>>>> than semiempirical for my system.
>>>>
>>>> I must correct my previous post, as I missed to notice the line
>>>>>
>>>>> Charm++> cpu affinity enabled.
>>>>
>>>> In new runs, described below, affinity info was complete in namd.log
>>>>
>>>>> Charm++> cpu affinity enabled.
>>>>> [1] pthread affinity is: 1
>>>>> [3] pthread affinity is: 3
>>>>> [4] pthread affinity is: 4
>>>>> [2] pthread affinity is: 2
>>>>> [0] pthread affinity is: 0
>>>>
>>>>
>>>> I went before into troubles with PAL# then I (badly) forgot to
>>>> reactivate it but, in my hands, such troubles remain. I.e., with either
>>>> PAL8 or PAL4, the error, revealed in /0/*TmpOut, was
>>>>
>>>>> There are not enough slots available in the system to satisfy the 4
>>>>> slots
>>>>> that were requested by the application:
>>>>> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi
>>>>
>>>>
>>>>> Either request fewer slots for your application, or make more slots
>>>>> available
>>>>> for use.
>>>>
>>>>
>>>> Settings were
>>>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8)
>>>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>>>> Print\[P_AtCharges_M\] 1 end"
>>>>
>>>>
>>>> #SBATCH --nodes=1
>>>> #SBATCH --ntasks=1
>>>> #SBATCH --cpus-per-task=36
>>>> #SBATCH --time=00:30:00
>>>> module load profile/archive
>>>> module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi,
>>>> the system complains that mpirun is unavailable and crashes. I must admit
>>>> to be confused about that because for a single node mpi should not be
>>>> requested)
>>>>
>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
>>>>
>>>> It seems that my settings are not providing hardware enough to ORCA
>>>> despite the full node of 36 cores.
>>>>
>>>> Thanks for advice
>>>>
>>>> francesco
>>>>
>>>>
>>>>
>>>> On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Francesco,
>>>>>
>>>>> The first line in your namd.log says
>>>>> "Info: Running on 5 processors, 1 nodes, 1 physical nodes."
>>>>> Which indicates NAMD is indeed using the 5 cores you requested with
>>>>> "+p5". Some times "top" will show just one process, but the CPU usage of
>>>>> the process will show 500%, for example, indicating 5 cores. This happens
>>>>> in some cluster management systems too.
>>>>>
>>>>> As for ORCA, your "qm config line" does not indicate you are
>>>>> requesting it to use multiple cores, so it most likely is really using just
>>>>> one. You should be using the keyword "PAL?", where the question mark
>>>>> indicates the number of requested cores: use "PAL8", for example, to ask
>>>>> for 8 cores.
>>>>> You should become familiarized with the commands that control how ORCA
>>>>> distributes its computations in a cluster (their manual is very good), as
>>>>> you may need to provide a "hostfile" indicating the name(s) of the node(s)
>>>>> where ORCA will find available processors. This is something every cluster
>>>>> makes available when the queuing system reserves nodes for a job, so you
>>>>> should find out how to access that in your cluster.
>>>>>
>>>>> As a final note, even in parallel, calculating 341 QM atoms (QM
>>>>> system + link atoms) using DFT will be slow. Really slow. Maybe not 10
>>>>> hours per timestep, but you just went from a medium sized semi-empirical
>>>>> (parallel MOPAC) calculation to large DFT one. Even in parallel, MOPAC
>>>>> could take a couple of seconds per timestep (depending on CPU power).
>>>>> ORCA/DFT will take much more than that.
>>>>>
>>>>> Best,
>>>>> Marcelo
>>>>> ---
>>>>> Marcelo Cardoso dos Reis Melo, PhD
>>>>> Postdoctoral Research Associate
>>>>> Luthey-Schulten Group
>>>>> University of Illinois at Urbana-Champaign
>>>>> crdsdsr2_at_illinois.edu
>>>>> +1 (217) 244-5983
>>>>>
>>>>>
>>>>> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on
>>>>>> one cluster node on my system (large qm part, see below, including two iron
>>>>>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same
>>>>>> cluster (36 cores along two sockets). So far I was unable to have namd and
>>>>>> orca running on more than one core each.
>>>>>>
>>>>>> namd.conf
>>>>>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad SlowConv"
>>>>>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
>>>>>> Print\[P_AtCharges_M\] 1 end"
>>>>>> (SCF already converged by omitting "enGrad")
>>>>>>
>>>>>> namd.job
>>>>>> #SBATCH --nodes=1
>>>>>> #SBATCH --ntasks=1
>>>>>> #SBATCH --cpus-per-task=36
>>>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>>>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log
>>>>>>
>>>>>> namd.log
>>>>>> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
>>>>>> Info: Number of QM atoms (excluding Dummy atoms): 315
>>>>>> Info: We found 26 QM-MM bonds.
>>>>>> Info: Applying user defined multiplicity 1 to QM group ID 1
>>>>>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1
>>>>>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF
>>>>>> charge.
>>>>>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM
>>>>>> Group 0 ID 1)
>>>>>> TCL: Minimizing for 200 steps
>>>>>> Info: List of ranks running QM simulations: 0.
>>>>>> Nothing about affinity!! (which was clearly displayed in MOPAC case)
>>>>>>
>>>>>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS
>>>>>>
>>>>>> "top" shown a single PR for both namd and orca.
>>>>>> ___-
>>>>>> I had already tried a different job setting
>>>>>> #SBATCH --nodes=1
>>>>>> #SBATCH --ntasks-per-node=4
>>>>>> #SBATCH --ntasks-per-socket=2
>>>>>> module load profile/archive
>>>>>> module load autoload openmpi/2.1.1--gnu--6.1.0
>>>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>>>> namd-01.conf +p5 > namd-01.log
>>>>>>
>>>>>> Here too, "top" showed a single PR for both namd and orca, so that in
>>>>>> about 20 hous, namd.log was at "ENERGY 2", indicating that 1400 hrs were
>>>>>> needed to complete the simulation.
>>>>>>
>>>>>> Thanks for advice
>>>>>> francesco pietra
>>>>>>
>>>>>>
>>>>>>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:28 CST