Re: NAMD run on Intel hyperthreaded cores

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Tue Feb 06 2018 - 13:54:50 CST

Try running numatcl -H to understand how the "real" and the hyper-threaded
cores are numbered. The problem may be in your case the argument to
+pemap, not +setcpuaffinity itself.

Giacomo

On Tue, Feb 6, 2018 at 9:20 AM, Souvik Sinha <souvik.sinha893_at_gmail.com>
wrote:

> Sorry, that's my mistake. It was actually 2 CPUs with 16 cores each and 2
> threads per core.
>
> Thaks for your reply.
>
> On Tue, Feb 6, 2018 at 6:01 PM, Jérôme Hénin <jerome.henin_at_ibpc.fr> wrote:
>
>> Hi Souvik,
>>
>> This CPU has 16 cores, the 32 cores are virtual. You may get similar
>> throughput with just 16 threads. At any rate, 64 seems excessive.
>>
>> https://ark.intel.com/products/91766/Intel-Xeon-Processor-E5
>> -2683-v4-40M-Cache-2_10-GHz
>>
>> Jerome
>>
>> On 6 February 2018 at 09:41, Souvik Sinha <souvik.sinha893_at_gmail.com>
>> wrote:
>>
>>> Hi,
>>> There is a cluster with nodes containing 32 cpu cores and each core is
>>> doubly threaded and the processor is "Intel(R) Xeon(R) CPU E5-2683 v4". I
>>> am currently using "NAMD_2.12_Linux-x86_64-multicore" binary in that
>>> cluster. I am not exactly sure how to distribute hyperthreaded jobs. So, to
>>> check the performance of hyperthreaded cores, the following commands are
>>> tried and the resulting "Benchmark time" are given:
>>>
>>> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p64 <inputfile > :
>>> Benchmark: 0.170056 days/ns
>>>
>>> charmrun namd2 +p64 <inputfile > : Benchmark: 0.168904 days/ns
>>>
>>>
>>> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p32 <inputfile > :
>>> Benchmark: 0.228512 days/ns
>>>
>>> charmrun namd2 +p32 <inputfile > : Benchmark: 0.157081 days/ns
>>>
>>> I can't see how this "+setcpuaffinity" is helping, as without defining
>>> the mapping of PEs on threads is working fine (considering the Benchmark
>>> time). Does multicore binary, by default, distribute processes on all
>>> available threads (i.e. 64 in this case) and there is no need of
>>> "+setcpuaffinity"? If that is true, then why with and
>>> without "+setcpuaffinity", benchmark time differs significantly while
>>> launching 32 processes?
>>>
>>> Please help me to understand this. Thank you.
>>>
>>> --
>>> Souvik Sinha
>>> Research Fellow
>>> Bioinformatics Centre (SGD LAB)
>>> Bose Institute
>>>
>>> Contact: 033 25693275
>>>
>>
>>
>
>
> --
> Souvik Sinha
> Research Fellow
> Bioinformatics Centre (SGD LAB)
> Bose Institute
>
> Contact: 033 25693275
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:50 CST