AW: LES very slow

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Mar 04 2013 - 01:27:27 CST

Hi Siri,

 

did you use the infiniband in this test?

 

Well the numbers are not very accurate, but there shouldn’t be too much
difference here. Do you use a ibverbs mpi or ipoib ? What’s the output of
“cat /sys/class/net/ib0/m*”

 

Norman Geist.

 

Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Donnerstag, 28. Februar 2013 23:24
An: Norman Geist
Betreff: Re: namd-l: LES very slow

 

This is the head of the log file:

Charm++> Running on MPI version: 2.1

Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
MPI_THREAD_SINGLE)

Charm++> Running on non-SMP mode

Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (12-way SMP).

Charm++> cpu topology info is gathered in 0.001 seconds.

Info: NAMD 2.9 for Linux-x86_64-MPI

Info:

Info: Please visit http://www.ks.uiuc.edu/Research/namd/

 

With 2 cpus on one node the time is now 11-12days pr. ns.
With 1 cpu on two nodes the time is 13 days pr. ns.

1 cpu on one node is 18 days pr. ns today.

 

 

 

2013/2/28 Norman Geist <norman.geist_at_uni-greifswald.de>

Hi again Siri,

 

ok so your basic setup is ok. But what’s about LES. I just can’t imagine a
reason for this kind of simulation being limited in scaling. You are right
with the 255 copies, seems I had an older manual. Can we see an output of
this LES simulations (head). Also, as a quick test for your node
interconnect, could you try the following with your LES simulation:

 

1. 2 Cores @ 1 Node = 2 Processes

2. 2 Cores @ 2 nodes = 2 Processes

 

So we can see if using your network makes a big difference. Nevertheless,
the scaling on one node should be better.

 

Now the developers could jump in and tell if there are known scaling issues
when using LES.

 

Norman Geist.

 

Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Mittwoch, 27. Februar 2013 23:22

An: Norman Geist
Betreff: Re: namd-l: LES very slow

 

Hi

 

The manual for NAMD 2.9 says up to 255 copies is supported.When I do a
normal simulation with 1 cpu and 12 cpus the simulation time is estimated to
be 2,2 and 0,4 days pr. ns, respectively. If I increase the number to 24 (2
nodes times 12 cpus) the estimated time is 0,2 days pr. ns. If I do the same
for the LES system I get no decrease in simulation time.

2013/2/27 Norman Geist <norman.geist_at_uni-greifswald.de>

Hi Siri,

 

so far, I couldn’t find a reason for your problem in your hardware. I don’t
know what LES is actually doing, but the manual tells that NAMD only
supports up to 15 copies.

Nevertheless, I can’t see a reason why this kind of computation should harm
the good scaling of namd. Does “normal” md scale better, so we can identify
if it is a general problem of your setup, or if it is due LES.

 

Regards

Norman Geist.

 

Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Mittwoch, 27. Februar 2013 00:11
An: Norman Geist
Betreff: Re: namd-l: LES very slow

 

I've attached the files... I hope this is what you were looking for.

 

2013/2/26 Norman Geist <norman.geist_at_uni-greifswald.de>

Hi Siri,

 

to help you we could use some information about the hardware you use.
Approximating you use linux, please supply the output of the following
commands:

 

1. cat /proc/cpuinfo

2. lspci

 

This should be enough for the beginning.

 

PS: If not using linux, please give otherwise information about the hardware
you use.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Siri Søndergaard
Gesendet: Dienstag, 26. Februar 2013 01:00
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: LES very slow

 

Hi

 

I'm trying to run LES on a system of ~30.000 atoms. I'm using 20 copies of
each of two dyes attached to DNA. The problem is when I extend the
simulation to more than one cpu the scaling does not increase accordingly.
An increase from one to 12 cpus only gives a decrease in simulation time
from ~9 days to ~4 days pr. ns. Does anybody know how to solve this?

 

Best regards, Siri

 

 

 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:01 CST