Re: Benefits of SandyBridge EP in NAMD2.9

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Jun 01 2012 - 09:19:50 CDT

On Fri, Jun 1, 2012 at 10:04 AM, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:
> On Fri, Jun 1, 2012 at 9:57 AM, Jérôme Hénin <jhenin_at_ifr88.cnrs-mrs.fr> wrote:
>> On a related note, would it make sense to try and reorder atoms in
>> memory by neighborhood, to make memory accesses more regular and
>> increase cache hits?
>
> absolutely. the speedup can be quite drastic.

i should add, that NAMD is less affected due to
the way how it is being parallelized, i.e. due to
the fact that its domain decomposition is not
based on the number of processors, but to generate
smaller domains, to do load balancing. also it may
take a significant amount of time until an aqueous
system becomes disordered. with LAMMPS i see
a different of about 5% develop over 100ps of MD.

one reference to look at would be:
Meloni, Rosati and Colombo, J Chem Phys, 126, 121102 (2007).
which is what the strategy employed in LAMMPS is based on.

axel.
>
> axel.
>
>>
>> Jerome
>>
>>
>> On 1 June 2012 15:35, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:
>>> On Fri, Jun 1, 2012 at 7:52 AM, Florian Mrugalla
>>> <florian.mrugalla_at_uni-dortmund.de> wrote:
>>>> Dear NAMD mailing list subscribers,
>>>>
>>>> Lately I thought wether NAMD2.9 benefits from the new
>>>> avx instruction set present in the SandyBridge EP processors.
>>>> If there is a benefit how large would you estimate the effect?
>>>>
>>>> Is one of the precompiled versions capable of using the avx instruction set
>>>> or do I have to compile it from scratch?
>>>> If I have to compile it for onself are there any suggestions on how to
>>>> gain the most benefits from avx?
>>>>
>>>> According to the link beneath for arithmetic heavy processes there should
>>>> be a substantial speedup.
>>>> http://www.hpcwire.com/hpcwire/2012-05-08/chips_on_the_table:_sandy_bridge_versus_westmere.html
>>>
>>> but classical MD forcefields like CHARMM or Amber were
>>> designed to have as little arithmetic as possible. performance
>>> is much more governed by memory bandwidth and cache sizes
>>> due to using neighbor lists and thus rather irregular memory
>>> accesses than by floating point performance. if you were doing
>>> linear algebra heavy stuff things would be different...
>>>
>>> axel.
>>>>
>>>> Thanks in advance and Best Regards,
>>>> Florian
>>>>
>>>
>>>
>>>
>>> --
>>> Dr. Axel Kohlmeyer
>>> akohlmey_at_gmail.com  http://goo.gl/1wk0
>>>
>>> College of Science and Technology
>>> Temple University, Philadelphia PA, USA.
>>>
>
>
>
> --
> Dr. Axel Kohlmeyer
> akohlmey_at_gmail.com  http://goo.gl/1wk0
>
> College of Science and Technology
> Temple University, Philadelphia PA, USA.

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:03 CST