AW: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?

From: Norman Geist (
Date: Mon Dec 01 2014 - 09:19:24 CST

> -----Ursprüngliche Nachricht-----
> Von: [] Im
> Auftrag von Michel van der List
> Gesendet: Montag, 1. Dezember 2014 15:41
> An:
> Betreff: namd-l: Running CUDA-enabled NAMD on multiple nodes without
> InfiniBand?
> All,


> I've been working with a researcher here at Penn to try to get the
> CUDA-enabled NAMD
> running on multiple nodes (in this particular case using Amazon Web
> Services). So far I've
> been successful in getting the CUDA version (Linux-x86_64-multicore-
> CUDA) running
> as well as the TCP based multi-CPU/system (Linux-x86_64-TCP), using 2
> systems with a
> total of 32 virtual CPUs (the instance types were g2.2xlarge and
> c3.4xlarge for those of
> you familiar with AWS).

No, I'm not.

> However, the CUDA version I downloaded does not supply the charmrun
> binary to use

In this case you likely downloaded a multicore version which is single-node.

> with multiple nodes and trying to use the charmrun from the TCP based
> version did not
> work (it appears it starts the GPU versions on the specified nodes, but
> it does not
> progress and the binaries appear to be going into some sort of CPU
> loop.
> My questions:
> - Is this possible at all, or is the CUDA code tightly coupled with the
> infiniband code?

Yes it's easily possible. No there's no force of Infiniband.

> - If this is possible (and presumably I'd need to compile the code
> myself) what compilation
> options should I be looking at in the documentation?

Simply download the source and see the notes.txt. You can almost copy all
the commands from the file into the console.
You will also need CUDA 6.0 to be installed. You can stay with the gnu

If you need further help, let us know.

PS: The reason why it is recommended to use Infiniband (or other high
bandwidth/low latency networks) with CUDA enabled NAMD is, that the
to the inter process network scale with the computing power of the nodes. As
GPUs raise the computing power
by orders of magnitudes, Gbit network it highly insufficient and you will
NOT gain any speedup by using multiple nodes, maybe only for very, very
large systems.

The easiest illustration is:

The bigger the computing power per node for a given problem (system size),
the smaller the part problems to solve for the nodes, the faster the part
problems are done and results are sent around, the more often new part
problems need to be distributed to the nodes. This all results in more
network traffic. This means the nodes spend more time in waiting for work,
as they spend in actually doing work. See "Amdahls Law".

> Please note that I know nothing about NAMD (just enough to get the
> benchmarks to run),
> but I'm very familiar with how to compile/run software.
> Thanks!
> --
> Michel van der List
> Information Systems and Computing
> (o) +1 215 898 5790

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:23:05 CST