From: Vermaas, Joshua (Joshua.Vermaas_at_nrel.gov)
Date: Fri Sep 29 2017 - 16:49:40 CDT
There are some options you can play with. The one I've found that works
best for my systems is +ppn4. I *think* the reason this works well is
that it has communication and caching patterns that fit well with faster
clustering modes (most notably the quadrant one), but that was
definitely one of the arguments I played with to see what ran faster.
On 09/29/2017 02:12 PM, jing liang wrote:
> I downloaded the NAMD compiled version:
> Linux-KNL-multicore (Intel Xeon Phi KNL processor single node)
> The node where I am testing NAMD is an Intel Xeon Phi 7250 (Knight's
> Each core can use 4 hardware threads. My testing system consisted of
> 20,000 atoms.
> I did a comparison with other installed versions of NAMD on regular
> nodes (not KNL)
> which have 28 cores (also intel), these are a SMP and a CUDA versions.
> The results are:
> Version Time (sec)
> SMP (28 threads) 80
> GPU (1 Node, 2 GPUs) 50
> KNL 160
> In the case of KNL, 160 sec. was the best performance I obtained after
> several trials
> using the setup:
> namd2 +ppn68 +setcpuaffinity +pemap 0-67
> I wonder if the performance of NAMD in the KNL case is lowered because
> I am not
> using the proper setup for running it (ppn, pemap) or maybe because it
> is a pre-compiled
> version what I am using.
This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:41 CST