Re: multi-cluster configuration

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Mon Aug 24 2009 - 17:12:42 CDT

On Mon, Aug 24, 2009 at 5:18 PM, Piotr Kopta<pkopta_at_man.poznan.pl> wrote:
> Hi all,

hi piotr,

> I'm trying to launch the NAMD in multi-cluster configuration therefore, I
> have a few
> questions about the NAMD architecture and it's constraints:
> - when I'm launching NAMD between 32- and 64-bit machines, I get error
> 'Unknown message type' - does this mean that NAMD can't be executed
> simultaneously on different architectures ? (on both machines I have used
> 2.7b1 version compiled with gcc/mpi support)

you would have to use the same 32-bit version for both
the 32-bit and the 64-bit nodes.

> - whether the NAMD can adapt to the multi-cluster architecture: fast
> inter-nodes
> connections and relatively slow inter-cluster connections ? does anybody
> perform such tests ?

i don't think that this is worth the hassle. most of the time parallel
applications don't do proper load balancing and thus would require
homogeneous selection of nodes anyways. what would be the best
strategy to proceed depends a lot on what kind of clusters, with how
many nodes of what type you want to "connect". i would rather set
things up so that jobs get only placed in homogeneous environments.
in my experience a key point in cluster configuration is to make it
easy for users to do the right thing. a complex multi-architecture
setup is just inviting users to make mistakes.

> - do such multi-cluster architectures require different scheduler settings
> than a single cluster configuration in order to obtain optimal performance
> ?

it is possible to run multiple clusters from a single front end and scheduler
and have the job scheduler place jobs so that they do not cross subcluster
or network boundaries and thus always run with maximum efficiency.
i would still try to place different architectures into separate queues.

> - is it possible to launch NAMD simultaneously on two different
> architectures
> e.g. some processes on x86_64 and few on GPU accelerated machines ?

i don't think so, and i would doubt that there is much gain. from my limited
experiments the GPU accelerated nodes would have a different performance
optimum than non-GPU accelerated nodes. depending on how many you
GPU nodes you have and how they are set up, i would either put them on
a separate queue or define an additional node flag so that GPU jobs would
only placed on a subset of nodes in a cluster that have GPUs and if there
are no requests for GPUs those nodes would double as regular CPU nodes.

i can provide some example maui/torque configuration files, and some
more detailed explanations of our strategies, if needed.

cheers,
    axel.

> Any informations would be greatly appreciated.
>
> Thank you in advance,
>
> Piotr Kopta
> email: pkopta_at_man.poznan.pl
> Poznan Supercomputing and Networking Center
>
>

-- 
Dr. Axel Kohlmeyer    akohlmey_at_gmail.com
Institute for Computational Molecular Science
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:13 CST