There is a cluster with nodes containing 32 cpu cores and each core is
doubly threaded and the processor is "Intel(R) Xeon(R) CPU E5-2683 v4". I
am currently using "NAMD_2.12_Linux-x86_64-multicore" binary in that
cluster. I am not exactly sure how to distribute hyperthreaded jobs. So, to
check the performance of hyperthreaded cores, the following commands are
tried and the resulting "Benchmark time" are given:

charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p64 <inputfile > :
Benchmark: 0.170056 days/ns

charmrun namd2 +p64 <inputfile > : Benchmark: 0.168904 days/ns

charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p32 <inputfile > :
Benchmark: 0.228512 days/ns

charmrun namd2 +p32 <inputfile > : Benchmark: 0.157081 days/ns

I can't see how this "+setcpuaffinity" is helping, as without defining the
mapping of PEs on threads is working fine (considering the Benchmark time).
Does multicore binary, by default, distribute processes on all available
threads (i.e. 64 in this case) and there is no need of "+setcpuaffinity"?
If that is true, then why with and without "+setcpuaffinity", benchmark
time differs significantly while launching 32 processes?

Please help me to understand this. Thank you.

