Re: optimising namd ibverb runs

From: Aaron Larsen (alarsen_at_molbio.mgh.harvard.edu)
Date: Mon May 18 2015 - 08:48:18 CDT

Hi all,

A question about parameterization using the Force Field Toolkit:

I'm currently attempting to determine parameters for some rather odd
looking nucleobases and everything is working well but I am left with 3
theoretical questions:

1) The keto form is likely going to be the lowest in energy so I am using
this form for the coordinates, however, it's occurred to me that the sp3
tetrahedral geometry of the amino groups does not correspond to the planar
geometry accepted by convention for nucleobase aminos. Is it recommended to
force planar geometry here?

2) For the water interactions, people seem to typically just look at those
interactions with the exterior groups decorating the nucleobases, ie. the
carbonyl oxygens, amino protons, sp2 protons, etc. They seem to ignore
water interactions with the carbons and nitrogens of the rings themselves.
Is this recommended?

3) For computational savings, it appears to be conventional to parameterize
just the base with no regard for the sugar. Might this result in incorrect
terms, especially in terms of partial charge, at the N1 position?

Best,
Aaron

On Mon, May 18, 2015 at 9:16 AM, Norman Geist <
norman.geist_at_uni-greifswald.de> wrote:

> Hi,
>
>
>
> You may want to disable Hyper-Threading. Hyper-Threading (or logical
> cores) don’t bring any use in HPC and as the OS doesn’t care which core is
> real and which not, it can’t distribute the processes very good. As a
> alternative you can use e.g. taskset to select the non-shared cores.
>
>
>
> Example: (depending on core layout; benchmark which is
> fastest)
>
>
>
> charmrun +p 192 +nodelist yournodelistfile taskset -c
> 0-7,16-23,32-39,48-55 namd2 your.in
>
>
>
> or
>
>
>
> charmrun +p 192 +nodelist yournodelistfile taskset -c
> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
> namd2 your.in
>
>
>
> Personally I think its easier to just disable HT in the BIOS.
>
> Also have a look at Amdahls law. You will unlikely get full linear
> scaling, depending on system size. A penalty of about 20% is normal for a
> reasnable amount of cores for a given system size.
>
>
>
> Norman Geist.
>
>
>
> *From:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *On
> Behalf Of *Thanassis Silis
> *Sent:* Monday, May 18, 2015 1:48 PM
> *To:* namd-l_at_ks.uiuc.edu
> *Subject:* namd-l: optimising namd ibverb runs
>
>
>
> Hello everyone,
> I am running some relatively small andm simulations in a system of 6 blade
> processing servers. Each has the following specs
>
> POWEREDGE M910 BLADE SERVER
> (4x) INTEL XEON E7-4830 PROCESSOR (2.13GHZ)
> 256GB MEMORY FOR 2/4CPU
> 900GB, SAS 6GBPS, 2.5-IN, 10K RPM HARD DISK
> MELLANOX CONNECT X3 QDR 40GBPS INFINIBAND
>
> each of the 4 processors has 8 cores and due to hyper-threading 16 threads
> are available.
> Indeed, cat /proc/cpuinfo returns 64 cpus on each system.
>
> I have created a nodelist file using the infiband interface ip address - I
> am also using the ibverbs namd executable. I have run several test
> simulations to figure out which setting minimizes processing time. Overall
> it seems that for 64 cpus/system * 6 systems = 384 cpus , I get to
> minimize the processing time by using "+p128 +setcpuaffinity"
>
> This seems odd as it is 1/3 of the available cpus. It's not half - which
> would seem sensible (if one of each core's threads works, it utilizes the
> full resources of the other thread of the core and this maximizes
> performance).
>
> One of the things I tried was to let the system decide which cpu's to use,
> with
> charmrn namd2 ++nodelist nodelist +setcpuaffinity `numactl --show | awk
> '/^physcpubind/ {printf "+p%d +pemap %d",(NF-1),$2;
> for(i=3;i<=NF;++i){printf ",%d",$i}}'` sim.conf > sim.log
>
> and also to manually assign worker threads and comminucation threads. I
> may (or may not!) have managed that with the command
> charmrun namd2 ++nodelist nodelist +setcpuaffinity +p64 +pemap 0-63:16.15
> +commap 15-63:16
> In this above command, I am not sure how should I "see" the 64 * 6 cpus.
> as 6 same systems ? (so add +p64), or aggregate them to 384 cpus (so add
> +p384 above). I did try +p384 but it seems to be even slower - way too many
> threads have been spawned.
>
> So I am fuzzy. Why do I get minimized process time when 1/3 of the 384
> cpus are used and no manual settings are in place? Are charmrun and namd2
> clever enough at this version (2.10) that they assign worker and comm
> threads automagically?
>
> Is there some other parameter that you suggest I should append, because at
> the very least using 1/3 of the cpus seems Very odd.
>
> Thank you for your time and input.
>

-- 
Aaron Larsen, Ph.D.
Harvard University Department of Chemistry and Chemical Biology
Harvard Medical School Department of Genetics
E-mail: alarsen_at_molbio.mgh.harvard.edu
Mobile: 617-319-3782
FAX: 617-643-3328

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:52 CST