Re: NAMD speed on MPICH2 Ubuntu 64 bit Cluster

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Apr 06 2011 - 12:06:41 CDT

On Wed, Apr 6, 2011 at 12:19 PM, Robert McCarrick
<rob.mccarrick_at_muohio.edu>wrote:

> Hi Everyone,
> Just to give more information on this. If I use the following command
> (with the TCP optimized version of NAMD for x86_64 Linux):
>
> ./charmrun namd2 +p6 <configuration_file>
>
> I get a time of 0.0290901 s/step and I get 6 processes running on the main
> computer with a system load of 1.32. If I use the following command:
>
> ./charmrun namd2 +p24 <configuration_file>
>
> I get a time of 0.0497453 s/step and I get 6 processes on each of the four
> computers, but the system load on the main computer on which I executed the
> command has a load of 0.53 and each of the other three computers have loads
> of about 0.01, indicating that they aren't really doing much of anything
> even though they have 6 namd processes running. I have a nodelist file and
> all of the computers can SSH to each other without a password. The
> directory in which the NAMD and configuration files are contained is
> mirrored on the other three computers via NFS (all of the user UIDs and GIDs
> and permissions are carrying over fine). I've been searching online and
> haven't found any way to address this. As mentioned in the previous email,
> I also compiled the mpi-Linux-x86_64 version and it doesn't seem to help the
> problem. Any help would be greatly appreciated.
>

rob,

TCP/IP networking doesn't give you great scaling, because of the high
latencies.
classical MD is quite sensitive to that, since you need to communicate
multiple
times in each time step and the computing effort for each step is rather
small.

now NAMD can do _some_ latency hiding, and thus does much better over
TCP/IP than most other codes that i know. nevertheless, with 6 cores per
node,
you are really pushing the limit. you may benefit from the multi-core
version
that is now provided with version 2.8b1, as that will limit the
communication
to one task (instead of 6 tasks fighting for access to the network).

if you really want good performance, you need to consider buying a fast
low-latency interconnect. there are several of them with different
properties
and costs associated. the most popular currently seems to be infiniband,
which seems to be a good match. i am seeing very good scaling behavior
of NAMD (or rather charm++) using the IBVERBS library interface.

cheers,
     axel.

Thanks,
> Rob
>
>
> On Tue, 2011-04-05 at 14:51 -0400, Robert McCarrick wrote:
>
> Hi Everyone,
> I'm new to the cluster computer world. I've built a four-computer cluster,
> each with a 6-core AMD Phenom processor running Ubuntu Server 10.10 64 bit.
> I've tried both the TCP optimized version on NAMD and compiling from scratch
> with the mpi-Linux-X86_64 build of Charm. In all cases, I'm getting about a
> 4-fold reduction in calculation times when I run the job utilizing all four
> computers (i.e. going from +p6 to +p24 causes a big slowdown). This seems
> odd and I was wondering if anyone had any suggestions as to where I might
> have gone wrong.
> Rob
>
> --
> Robert M. McCarrick, Ph.D.
> EPR Instrumentation Specialist
> Department of Chemistry and Biochemistry
> Miami University
> 701 E. High Street
> 101 Hughes Laboratories
> Oxford, OH 45056513.529.0507 CW Room513.529.6829 Pulse Room513.529.5715 faxrob.mccarrick_at_muohio.eduhttp://epr.muohio.edu
>
>
> --
> Robert M. McCarrick, Ph.D.
> EPR Instrumentation Specialist
> Department of Chemistry and Biochemistry
> Miami University
> 701 E. High Street
> 101 Hughes Laboratories
> Oxford, OH 45056513.529.0507 CW Room513.529.6829 Pulse Room513.529.5715 faxrob.mccarrick_at_muohio.eduhttp://epr.muohio.edu
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:23:48 CST