Re: Running QM-MM MOPAC on a cluster

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Mon Jan 07 2019 - 10:40:56 CST

Thanks for the great news!

I suspect that "numactl -C +5-35 ..." is failing because it is conflicting
with the affinity set by NAMD. I think the following should work since
the -a option ignores the current affinity of the launching thread. Also
note that the + is removed so these are absolute cpu ids.

qmExecPath "numactl -a -C 5-35 ..."

Jim

On Fri, 4 Jan 2019, Francesco Pietra wrote:

> Slurm setting
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=36
>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
>
> NAMD setting
> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
> GEO-OK THREADS=24"
>
> # qmExecPath "numactl -C +5-35
> /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>
> qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>
> NAMD log
> [1] pthread affinity is: 1
> [3] pthread affinity is: 3
> [0] pthread affinity is: 0
> [2] pthread affinity is: 2
> [4] pthread affinity is: 4
>
> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
>
> TIMING: 12926 CPU: 666.423, 0.050845/step
> TIMING: 14828 CPU: 763.82, 0.045536/step
> TIMING: 19676 CPU: 1013.25, 0.050659/step
>
> WallClock: 1049.411743 CPUTime: 1040.567749 Memory: 432.250000 MB
>
> at an amazing ca ten times faster than on previous trials. Which seems to
> me to be an absolute good for not using /dev/shm
> (I was unable to set it; asked advice at the cluster if at all possible or
> useful on the node)
>
> VARIANTS:
> --with THREADS=30 to MOPAC it was a bit slower
> TIMING: 13822 CPU: 720.347, 0.052429/step Wall: 726.456, 0.055537/step,
> perhaps because the Polyala tutorial is for a small system.
> --by assigning ten cores to namd it was somewhat slower.
>
> --I was unable to implement numactl by interpreting your suggestion
> as +number_of_cores_to_namd
> -total_number_of_cores_less_one, as follows
> qmExecPath "numactl -C +5-35
> /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>
> # qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>
>
> ------------ Processor 2 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: Error running command for QM forces calculation.
>
> Charm++ fatal error:
> FATAL ERROR: Error running command for QM forces calculation.
>
> /var/spool/slurmd/job582681/slurm_script: line 14: 957 Aborted
>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
>
> Thanks a lot for these advanced lessons
>
> francesco
>
>
> On Wed, Jan 2, 2019 at 11:40 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>
>>
>> For starters, use the faster settings from the previous emails:
>>
>>> #SBATCH --ntasks=1
>>> #SBATCH --cpus-per-task=34
>>
>> For a little more information add +showcpuaffinity.
>>
>> I suspect that +setcpuaffinity isn't looking at the limits on affinity
>> that are enforced by the queueing system, so it's trying to use a
>> forbidded cpu. If you request all cores on the node with
>> --cpus-per-task=36 that might make the problem go away.
>>
>> Jim
>>
>>
>> On Tue, 1 Jan 2019, Francesco Pietra wrote:
>>
>>> Thanks a lot for these suggestions. There must be some restriction
>>> hindering the suggested settings. Slurm, namd-01.conf, and error are
>> shown
>>> below in the given order:
>>>
>>> #!/bin/bash
>>> #SBATCH --nodes=1
>>> #SBATCH --ntasks=10
>>> #SBATCH --cpus-per-task=1
>>> #SBATCH --time=00:30:00
>>> #SBATCH --job-name=namd-01
>>> #SBATCH --output namd-01.out
>>> #SBATCH --error namd-01.err
>>> #SBATCH --partition=gll_usr_prod
>>> #SBATCH --mem=115GB
>>> #SBATCH --account=IscrC_QMMM-FER_1
>>> # goto launch directory
>>> cd $SLURM_SUBMIT_DIR
>>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>> namd-01.conf +p10 +setcpuaffinity > namd-01.log
>>>
>>> qmExecPath "numactl -C +10-33
>>> /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>>>
>>> $ cat *err
>>> pthread_setaffinity: Invalid argument
>>> pthread_setaffinity: Invalid argument
>>> pthread_setaffinity: Invalid argument
>>> ------------- Processor 7 Exiting: Called CmiAbort ------------
>>> Reason: set cpu affinity abort!
>>>
>>> Charm++ fatal error:
>>> set cpu affinity abort!
>>>
>>> /var/spool/slurmd/job540826/slurm_script: line 14: 21114 Segmentation
>>> fault
>>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>> namd-01.conf +p10 +setcpuaffinity > namd-01.log
>>>
>>> fp
>>>
>>> On Mon, Dec 31, 2018 at 4:42 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>>>
>>>>
>>>> Well, that's progress at least. I have one other idea to ensure that
>> NAMD
>>>> and MOPAC aren't competing with each other for the same cores:
>>>>
>>>> 1) Add "+setcpuaffinity" to the NAMD command line before ">".
>>>>
>>>> 2) Add "numactl -C +10-33" to the beginning of qmExecPath in
>> namd-01.conf
>>>> (quote the string, e.g., "numactl -C +10-33 /path/to/MOPAC.exe")
>>>>
>>>> This should keep NAMD on your first ten cores and MOPAC on the next 24.
>>>>
>>>> What is qmBaseDir set to? Something in /dev/shm is the best choice. If
>>>> qmBaseDir is on a network filesystem that could slow things down.
>>>>
>>>> Jim
>>>>
>>>>
>>>> On Fri, 21 Dec 2018, Francesco Pietra wrote:
>>>>
>>>>> I finally learned how to ssh on a given node. The results for
>>>>> #SBATCH --nodes=1
>>>>> #SBATCH --ntasks=10
>>>>> #SBATCH --cpus-per-task=1
>>>>>
>>>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>>> namd-01.conf +p10 > namd-01.log
>>>>>
>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
>>>>> GEO-OK THREADS=24"
>>>>>
>>>>> are
>>>>>
>>>>> ssh node181
>>>>> namd %cpu 720-750
>>>>> mopac %cpu 1-30
>>>>> 1 (per-core load):
>>>>> %Cpu0-4: 90-100
>>>>> %Cpu18-22: 60-100
>>>>> %Cpu5-17: 0.0
>>>>> %Cpu23-34: 0.0
>>>>>
>>>>> namd.log: 0.5/step (at 11min executed 1346 steps)
>>>>> ______________________
>>>>> As above, only changing
>>>>>
>>>>> SBATCH --nodes=1
>>>>> #SBATCH --ntasks=1
>>>>> #SBATCH --cpus-per-task=34
>>>>>
>>>>> ssh node181
>>>>> namd %cpu 900
>>>>> mopac %cpu 0-34
>>>>> 1
>>>>> %Cpu0-34: 0.3-100.0
>>>>>
>>>>> namd.log: 0.3/step (at 11min executed 2080 steps)
>>>>>
>>>>> Despite all cpus used, disappointing performance. I can't say whether
>>>> namd
>>>>> and mopac compete, at least in part, for the same cores.
>>>>>
>>>>> francesco
>>>>>
>>>>>
>>>>> On Mon, Dec 17, 2018 at 4:12 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>>>>>
>>>>>>
>>>>>> Since you are asking Slurm for 10 tasks with 1 cpu-per-task it is
>>>> possible
>>>>>> that all 34 threads are running on a single core. You can check this
>>>> with
>>>>>> top (hit "1" to see per-core load) if you can ssh to the execution
>> host.
>>>>>>
>>>>>> You should probably request --ntasks=1 --cpus-per-task=34 (or 36) so
>>>> that
>>>>>> Slurm will allocate all of the cores you wish to use. The number of
>>>> cores
>>>>>> used by NAMD is controlled by +p10 and you will need THREADS=24 for
>>>> MOPAC.
>>>>>>
>>>>>> It is a good idea to use top to confirm that all cores are being used.
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>> On Sun, 16 Dec 2018, Francesco Pietra wrote:
>>>>>>
>>>>>>> I had early taken into consideration the relative nr of threads, by
>>>>>>> imposing them also to MOPAC.
>>>>>>> Out of the many such trials, namd.config:
>>>>>>>
>>>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
>> QMMM
>>>>>>> GEO-OK THREADS=24"
>>>>>>>
>>>>>>> qmExecPath
>>>> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>>>>>>>
>>>>>>> corresponding SLURM:
>>>>>>> #SBATCH --nodes=1
>>>>>>> #SBATCH --ntasks=10
>>>>>>> #SBATCH --cpus-per-task=1
>>>>>>>
>>>>>>
>>>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>>>>> namd-01.conf +p10 > namd-01.log
>>>>>>>
>>>>>>> Thus, 24+10=34, while the number of cores on the node was 36. Again,
>>>>>>> execution took nearly two hours, slower than on my vintage VAIO with
>>>> two
>>>>>>> cores (1hr and half).
>>>>>>>
>>>>>>> As to the MKL_NUM_THREADS, I am lost, there is no such environment
>>>>>> variable
>>>>>>> in MOPAC's list. On the other hand, the namd night build I used
>>>> performs
>>>>>> as
>>>>>>> effective as it should with classical MD simulations on one node of
>> the
>>>>>>> same cluster.
>>>>>>>
>>>>>>> thanks
>>>>>>> fp
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 14, 2018 at 4:29 PM Jim Phillips <jim_at_ks.uiuc.edu>
>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> The performance of a QM/MM simulation is typically limited by the QM
>>>>>>>> program, not the MD program. Do you know how many threads MOPAC is
>>>>>>>> launching? Do you need to set the MKL_NUM_THREADS environment
>>>> variable?
>>>>>>>> You want the number of NAMD threads (+p#) plus the number of MOPAC
>>>>>> threads
>>>>>>>> to be less than the number of cores on your machine.
>>>>>>>>
>>>>>>>> Jim
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 14 Dec 2018, Francesco Pietra wrote:
>>>>>>>>
>>>>>>>>> Hi all
>>>>>>>>> I resumed my attempts at finding the best settings for running namd
>>>>>> qmmm
>>>>>>>> on
>>>>>>>>> a cluster. I used Example1, Polyala).
>>>>>>>>>
>>>>>>>>> In order to use namd2/13 multicore night build, I was limited to a
>>>>>> single
>>>>>>>>> multicore node, 2*18-core Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz
>> and
>>>>>> 128
>>>>>>>>> GB RAM (Broadwell)
>>>>>>>>>
>>>>>>>>> Settings
>>>>>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
>>>> QMMM
>>>>>>>>> GEO-OK"
>>>>>>>>>
>>>>>>>>> qmExecPath
>>>>>> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
>>>>>>>>>
>>>>>>>>> of course, on the cluster the simulation can't be run on shm
>>>>>>>>>
>>>>>>>>> execution line
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
>>>>>>>>> namd-01.conf +p# > namd-01.log
>>>>>>>>>
>>>>>>>>> where # was either 4, 10, 15, 36
>>>>>>>>>
>>>>>>>>> With either 36 or 15 core; segmentation fault
>>>>>>>>>
>>>>>>>>> With either 4 of 10 core, execution of the 20,000 steps of Example
>> 1
>>>>>> took
>>>>>>>>> nearly two hours. From the .ou file in folder /0, the execution
>> took
>>>>>> 0.18
>>>>>>>>> seconds.
>>>>>>>>>
>>>>>>>>> My question is what is wrong in my attempts to rationalize such
>>>>>>>>> disappointing performance.
>>>>>>>>>
>>>>>>>>> Thanks for advice
>>>>>>>>>
>>>>>>>>> francesco pietra
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:25 CST