From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Jan 04 2019 - 12:28:32 CST
Slurm setting
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=36
/galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
NAMD setting
qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
GEO-OK THREADS=24"
# qmExecPath "numactl -C +5-35
/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
NAMD log
[1] pthread affinity is: 1
[3] pthread affinity is: 3
[0] pthread affinity is: 0
[2] pthread affinity is: 2
[4] pthread affinity is: 4
Info: Running on 5 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
TIMING: 12926 CPU: 666.423, 0.050845/step
TIMING: 14828 CPU: 763.82, 0.045536/step
TIMING: 19676 CPU: 1013.25, 0.050659/step
WallClock: 1049.411743 CPUTime: 1040.567749 Memory: 432.250000 MB
at an amazing ca ten times faster than on previous trials. Which seems to
me to be an absolute good for not using /dev/shm
(I was unable to set it; asked advice at the cluster if at all possible or
useful on the node)
VARIANTS:
--with THREADS=30 to MOPAC it was a bit slower
TIMING: 13822 CPU: 720.347, 0.052429/step Wall: 726.456, 0.055537/step,
perhaps because the Polyala tutorial is for a small system.
--by assigning ten cores to namd it was somewhat slower.
--I was unable to implement numactl by interpreting your suggestion
as +number_of_cores_to_namd
-total_number_of_cores_less_one, as follows
qmExecPath "numactl -C +5-35
/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
# qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
------------ Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: Error running command for QM forces calculation.
Charm++ fatal error:
FATAL ERROR: Error running command for QM forces calculation.
/var/spool/slurmd/job582681/slurm_script: line 14: 957 Aborted
/galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
Thanks a lot for these advanced lessons
francesco
On Wed, Jan 2, 2019 at 11:40 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>
> For starters, use the faster settings from the previous emails:
>
> > #SBATCH --ntasks=1
> > #SBATCH --cpus-per-task=34
>
> For a little more information add +showcpuaffinity.
>
> I suspect that +setcpuaffinity isn't looking at the limits on affinity
> that are enforced by the queueing system, so it's trying to use a
> forbidded cpu. If you request all cores on the node with
> --cpus-per-task=36 that might make the problem go away.
>
> Jim
>
>
> On Tue, 1 Jan 2019, Francesco Pietra wrote:
>
> > Thanks a lot for these suggestions. There must be some restriction
> > hindering the suggested settings. Slurm, namd-01.conf, and error are
> shown
> > below in the given order:
> >
> > #!/bin/bash
> > #SBATCH --nodes=1
> > #SBATCH --ntasks=10
> > #SBATCH --cpus-per-task=1
> > #SBATCH --time=00:30:00
> > #SBATCH --job-name=namd-01
> > #SBATCH --output namd-01.out
> > #SBATCH --error namd-01.err
> > #SBATCH --partition=gll_usr_prod
> > #SBATCH --mem=115GB
> > #SBATCH --account=IscrC_QMMM-FER_1
> > # goto launch directory
> > cd $SLURM_SUBMIT_DIR
> >
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> > namd-01.conf +p10 +setcpuaffinity > namd-01.log
> >
> > qmExecPath "numactl -C +10-33
> > /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >
> > $ cat *err
> > pthread_setaffinity: Invalid argument
> > pthread_setaffinity: Invalid argument
> > pthread_setaffinity: Invalid argument
> > ------------- Processor 7 Exiting: Called CmiAbort ------------
> > Reason: set cpu affinity abort!
> >
> > Charm++ fatal error:
> > set cpu affinity abort!
> >
> > /var/spool/slurmd/job540826/slurm_script: line 14: 21114 Segmentation
> > fault
> >
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> > namd-01.conf +p10 +setcpuaffinity > namd-01.log
> >
> > fp
> >
> > On Mon, Dec 31, 2018 at 4:42 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> >
> >>
> >> Well, that's progress at least. I have one other idea to ensure that
> NAMD
> >> and MOPAC aren't competing with each other for the same cores:
> >>
> >> 1) Add "+setcpuaffinity" to the NAMD command line before ">".
> >>
> >> 2) Add "numactl -C +10-33" to the beginning of qmExecPath in
> namd-01.conf
> >> (quote the string, e.g., "numactl -C +10-33 /path/to/MOPAC.exe")
> >>
> >> This should keep NAMD on your first ten cores and MOPAC on the next 24.
> >>
> >> What is qmBaseDir set to? Something in /dev/shm is the best choice. If
> >> qmBaseDir is on a network filesystem that could slow things down.
> >>
> >> Jim
> >>
> >>
> >> On Fri, 21 Dec 2018, Francesco Pietra wrote:
> >>
> >>> I finally learned how to ssh on a given node. The results for
> >>> #SBATCH --nodes=1
> >>> #SBATCH --ntasks=10
> >>> #SBATCH --cpus-per-task=1
> >>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>> namd-01.conf +p10 > namd-01.log
> >>>
> >>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
> >>> GEO-OK THREADS=24"
> >>>
> >>> are
> >>>
> >>> ssh node181
> >>> namd %cpu 720-750
> >>> mopac %cpu 1-30
> >>> 1 (per-core load):
> >>> %Cpu0-4: 90-100
> >>> %Cpu18-22: 60-100
> >>> %Cpu5-17: 0.0
> >>> %Cpu23-34: 0.0
> >>>
> >>> namd.log: 0.5/step (at 11min executed 1346 steps)
> >>> ______________________
> >>> As above, only changing
> >>>
> >>> SBATCH --nodes=1
> >>> #SBATCH --ntasks=1
> >>> #SBATCH --cpus-per-task=34
> >>>
> >>> ssh node181
> >>> namd %cpu 900
> >>> mopac %cpu 0-34
> >>> 1
> >>> %Cpu0-34: 0.3-100.0
> >>>
> >>> namd.log: 0.3/step (at 11min executed 2080 steps)
> >>>
> >>> Despite all cpus used, disappointing performance. I can't say whether
> >> namd
> >>> and mopac compete, at least in part, for the same cores.
> >>>
> >>> francesco
> >>>
> >>>
> >>> On Mon, Dec 17, 2018 at 4:12 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> >>>
> >>>>
> >>>> Since you are asking Slurm for 10 tasks with 1 cpu-per-task it is
> >> possible
> >>>> that all 34 threads are running on a single core. You can check this
> >> with
> >>>> top (hit "1" to see per-core load) if you can ssh to the execution
> host.
> >>>>
> >>>> You should probably request --ntasks=1 --cpus-per-task=34 (or 36) so
> >> that
> >>>> Slurm will allocate all of the cores you wish to use. The number of
> >> cores
> >>>> used by NAMD is controlled by +p10 and you will need THREADS=24 for
> >> MOPAC.
> >>>>
> >>>> It is a good idea to use top to confirm that all cores are being used.
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>> On Sun, 16 Dec 2018, Francesco Pietra wrote:
> >>>>
> >>>>> I had early taken into consideration the relative nr of threads, by
> >>>>> imposing them also to MOPAC.
> >>>>> Out of the many such trials, namd.config:
> >>>>>
> >>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
> QMMM
> >>>>> GEO-OK THREADS=24"
> >>>>>
> >>>>> qmExecPath
> >> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >>>>>
> >>>>> corresponding SLURM:
> >>>>> #SBATCH --nodes=1
> >>>>> #SBATCH --ntasks=10
> >>>>> #SBATCH --cpus-per-task=1
> >>>>>
> >>>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>>>> namd-01.conf +p10 > namd-01.log
> >>>>>
> >>>>> Thus, 24+10=34, while the number of cores on the node was 36. Again,
> >>>>> execution took nearly two hours, slower than on my vintage VAIO with
> >> two
> >>>>> cores (1hr and half).
> >>>>>
> >>>>> As to the MKL_NUM_THREADS, I am lost, there is no such environment
> >>>> variable
> >>>>> in MOPAC's list. On the other hand, the namd night build I used
> >> performs
> >>>> as
> >>>>> effective as it should with classical MD simulations on one node of
> the
> >>>>> same cluster.
> >>>>>
> >>>>> thanks
> >>>>> fp
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Dec 14, 2018 at 4:29 PM Jim Phillips <jim_at_ks.uiuc.edu>
> wrote:
> >>>>>
> >>>>>>
> >>>>>> The performance of a QM/MM simulation is typically limited by the QM
> >>>>>> program, not the MD program. Do you know how many threads MOPAC is
> >>>>>> launching? Do you need to set the MKL_NUM_THREADS environment
> >> variable?
> >>>>>> You want the number of NAMD threads (+p#) plus the number of MOPAC
> >>>> threads
> >>>>>> to be less than the number of cores on your machine.
> >>>>>>
> >>>>>> Jim
> >>>>>>
> >>>>>>
> >>>>>> On Fri, 14 Dec 2018, Francesco Pietra wrote:
> >>>>>>
> >>>>>>> Hi all
> >>>>>>> I resumed my attempts at finding the best settings for running namd
> >>>> qmmm
> >>>>>> on
> >>>>>>> a cluster. I used Example1, Polyala).
> >>>>>>>
> >>>>>>> In order to use namd2/13 multicore night build, I was limited to a
> >>>> single
> >>>>>>> multicore node, 2*18-core Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz
> and
> >>>> 128
> >>>>>>> GB RAM (Broadwell)
> >>>>>>>
> >>>>>>> Settings
> >>>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
> >> QMMM
> >>>>>>> GEO-OK"
> >>>>>>>
> >>>>>>> qmExecPath
> >>>> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >>>>>>>
> >>>>>>> of course, on the cluster the simulation can't be run on shm
> >>>>>>>
> >>>>>>> execution line
> >>>>>>>
> >>>>>>
> >>>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>>>>>> namd-01.conf +p# > namd-01.log
> >>>>>>>
> >>>>>>> where # was either 4, 10, 15, 36
> >>>>>>>
> >>>>>>> With either 36 or 15 core; segmentation fault
> >>>>>>>
> >>>>>>> With either 4 of 10 core, execution of the 20,000 steps of Example
> 1
> >>>> took
> >>>>>>> nearly two hours. From the .ou file in folder /0, the execution
> took
> >>>> 0.18
> >>>>>>> seconds.
> >>>>>>>
> >>>>>>> My question is what is wrong in my attempts to rationalize such
> >>>>>>> disappointing performance.
> >>>>>>>
> >>>>>>> Thanks for advice
> >>>>>>>
> >>>>>>> francesco pietra
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:25 CST