From: Jérôme Hénin (jerome.henin_at_ibpc.fr)
Date: Tue Feb 06 2018 - 06:31:10 CST
This CPU has 16 cores, the 32 cores are virtual. You may get similar
throughput with just 16 threads. At any rate, 64 seems excessive.
On 6 February 2018 at 09:41, Souvik Sinha <souvik.sinha893_at_gmail.com> wrote:
> There is a cluster with nodes containing 32 cpu cores and each core is
> doubly threaded and the processor is "Intel(R) Xeon(R) CPU E5-2683 v4". I
> am currently using "NAMD_2.12_Linux-x86_64-multicore" binary in that
> cluster. I am not exactly sure how to distribute hyperthreaded jobs. So, to
> check the performance of hyperthreaded cores, the following commands are
> tried and the resulting "Benchmark time" are given:
> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p64 <inputfile > :
> Benchmark: 0.170056 days/ns
> charmrun namd2 +p64 <inputfile > : Benchmark: 0.168904 days/ns
> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p32 <inputfile > :
> Benchmark: 0.228512 days/ns
> charmrun namd2 +p32 <inputfile > : Benchmark: 0.157081 days/ns
> I can't see how this "+setcpuaffinity" is helping, as without defining the
> mapping of PEs on threads is working fine (considering the Benchmark time).
> Does multicore binary, by default, distribute processes on all available
> threads (i.e. 64 in this case) and there is no need of
> "+setcpuaffinity"? If that is true, then why with and
> without "+setcpuaffinity", benchmark time differs significantly while
> launching 32 processes?
> Please help me to understand this. Thank you.
> Souvik Sinha
> Research Fellow
> Bioinformatics Centre (SGD LAB)
> Bose Institute
> Contact: 033 25693275
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:19:40 CST