From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Jan 24 2014 - 01:01:04 CST
Regarding your performance drop, if you have Infiniband, check the output of
cat /sys/class/net/ib0/m*
on the compute nodes. If it is something like:
datagram
2044
Let your admin change it back to:
connected
65520
And your performance issues may be resolved.
Norman Geist.
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von JC Gumbart
> Gesendet: Donnerstag, 23. Januar 2014 18:14
> An: Hannes Loeffler
> Cc: namd-l_at_ks.uiuc.edu
> Betreff: Re: namd-l: recent comparison of NAMD to other MD engines
>
> On one of the compute clusters that I use, the performance of NAMD
> dropped noticeably after a standard quarterly maintenance just this
> week. The people managing it are working on tracking it down, although
> it’s hard to say what the cause is, as the same hardware, libraries,
> etc. is being used. My point is though that the performance of these
> advanced, parallel codes is dependent on an incredible number of
> variables that trying to draw general conclusions becomes a fool’s
> errand.
>
> Ideally, one would choose the code based on its optimized performance
> (this requires either a good sys admin or a lot of experience
> yourself!) on the hardware available. In practice though, we typically
> just use what we are most comfortable with. :)
>
> Regarding the specific issue in the paper, one idea is that they are
> using SDR or DDR Infiniband. So then NAMD saturates the network
> quickly. Since they only say “Infiniband” we can’t know for sure
> without asking.
>
>
> On Jan 23, 2014, at 5:05 AM, Hannes Loeffler
> <Hannes.Loeffler_at_stfc.ac.uk> wrote:
>
> > On Wed, 22 Jan 2014 21:38:29 -0500
> > Giacomo Fiorin <giacomo.fiorin_at_gmail.com> wrote:
> >
> >> Without reading the paper in detail (I only saw Figures 6-8), I
> think
> >> you should try to obtain the original input files used for each
> >> program, in particular the cutoffs, PME grid resolution, time steps,
> >> etc.
> >>
> >> It is not rare to find that the input files do not necessarily have
> >> the same parameters: I once saw a comparison between program X
> >> running with a 8 Å cutoff and 1.5 Å Ewald grid, vs. program Y
> running
> >> with a 12 Å cutoff and 0.8 Å Ewald grid. I'd leave it up to you to
> >> judge the accuracy of such comparison.
> >>
> >> Ultimately, benchmarks should be considered as any other scientific
> >> data: they must be reproducible.
> >
> >
> > The paper appears to be completely based on my benchmark suite which
> is
> > readily available with all necessary input files from
> > http://www.stfc.ac.uk/CSE/randd/cbg/Benchmark/25241.aspx . So
> > reproducibility shouldn't be a problem provided the authors haven't
> > changed the input parameters (except probably for the new CHARMM code
> > as needed) or documented such. In the associated reports to the
> suite
> > I also tried to make it clear as possible that the user should be
> > careful about comparison and try to give advice how performance could
> > possibly be improved. There I also encourage the user to carry out
> > benchmarks of their own simulation systems on their chosen hardware.
> >
> > Those benchmarks, in particular Fig 8, may show some weakness of NAMD
> > on this particular hardware configuration (the NAMD developers may be
> > able to comment on that) or could also show a problem how the
> > benchmarks were run. Regarding hardware, the benchmarks may look
> quite
> > different on different hardware, i.e. those benchmarks only tell us
> how
> > the codes performed on the specific hardware the authors have chosen
> > (that's certainly a limiting factor and the authors don't tell us too
> > much about their hardware). I personally have never seen such a
> > "dramatic" (see below for comments on that) drop in performance of
> NAMD
> > as depicted in Fig 8 but then I should probably add that most of my
> > benchmarking was done on "super-computers".
> >
> > It should also be noted that in Fig 8 the absolute performance of
> NAMD
> > at 256 cores is still slightly better than Gromacs at 512 cores.
> > We don't see when GROMACS' performance peaks or when it starts to
> drop.
> > Also, at 256 cores NAMD performs twice as fast as Gromacs! If you
> have
> > to pay for your usage you will probably think twice if you want to
> run
> > NAMD or GROMACS on that particular system on that particular
> > hardware. You really ought to think how much cores you can afford in
> > actual work. Good science work dictates multiple runs for
> statistical
> > purposes/reproducibility (=independent runs) and you probably also
> have
> > to compete for resources with other users, etc. It's interesting to
> see
> > that GROMACS appears to be so badly performing on a per-core basis as
> > that's quite the opposite of what I have seen so far.
> >
> > There is probably much more that I could say here. Partly I have
> tried
> > to discuss this in my benchmark reports. But the summary is that
> users
> > should look very carefully at what benchmark data really mean and in
> > particular what in means for their very own personal circumstances.
> > Personally, I don't really see that the performance of NAMD in Fig. 8
> is
> > a problem in practical work. In fact, I would say NAMD looks really
> > great.
> >
> > Cheers,
> > Hannes.
> >
> >
> >> On Wed, Jan 22, 2014 at 7:22 PM, Bennion, Brian <Bennion1_at_llnl.gov>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> Based on this recent publication
> >>> http://onlinelibrary.wiley.com/doi/10.1002/jcc.23501/abstract
> >>>
> >>> NAMD2.9 stumbles compared with gromacs and an improved version of
> >>> charmm on a large system (465404 atoms and 500 cores).
> >>>
> >>> Any ideas as to the cause of this dramatic difference in speed
> >>> between 256 and 400 cores?
> >>>
> >>>
> >>>
> >>> Brian
> > --
> > Scanned by iCritical.
--- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:20:25 CST