RE: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?

From: Michel van der List (
Date: Wed Dec 03 2014 - 12:06:03 CST

Thank you for your reply.

>> I've been working with a researcher here at Penn to try to
>> get the CUDA-enabled NAMD running on multiple nodes (in this
>> particular case using Amazon Web Services). So far I've been
>> successful in getting the CUDA version (Linux-x86_64-multicore-
>> CUDA) running as well as the TCP based multi-CPU/system
>> (Linux-x86_64-TCP), using 2 systems with a total of 32 virtual
>> CPUs (the instance types were g2.2xlarge and c3.4xlarge for
>> those of you familiar with AWS).
>No, I'm not.
>> However, the CUDA version I downloaded does not supply the
>> charmrun binary to use
>In this case you likely downloaded a multicore version which is

That is correct (Linux-x86_64-multicore-CUDA).

>> with multiple nodes and trying to use the charmrun from the
>> TCP based version did not work (it appears it starts the GPU
>> versions on the specified nodes, but it does not progress and
>> the binaries appear to be going into some sort of CPU loop.
>> My questions:
>> - Is this possible at all, or is the CUDA code
>> tightly coupled with the infiniband code?
>Yes it's easily possible. No there's no force of Infiniband.
>> - If this is possible (and presumably I'd need to compile the
>> code myself) what compilation options should I be looking
>> at in the documentation?
>Simply download the source and see the notes.txt. You can almost
>copy all the commands from the file into the console You will
>also need CUDA 6.0 to be installed. You can stay with the gnu
>If you need further help, let us know.

I got all of that done and ended up with the executables. I'm
working with the researcher to confirm that it's actually working
correctly. Early indications are that using multiple GPUs at AWS
does speed things up. I ran the apoa1.namd test and the time it
took dropped from 48.5 to 29.7 seconds, which seems like
a reasonable pay off. I haven't yet looked into whether I can get
even better network throughput at AWS if I change some of my
configuration. If there is any interest, I'm happy to share the
rest of our experience.

>PS: The reason why it is recommended to use Infiniband (or other
>high bandwidth/low latency networks) with CUDA enabled NAMD is,
>that the requirements to the inter process network scale with the
>computing power of the nodes. As GPUs raise the computing power
>by orders of magnitudes, Gbit network it highly insufficient and
>you will NOT gain any speedup by using multiple nodes, maybe only
>for very, very large systems.
>The easiest illustration is:
>The bigger the computing power per node for a given problem
>(system size), the smaller the part problems to solve for the
>nodes, the faster the part problems are done and results are sent
>around, the more often new part problems need to be distributed
>to the nodes. This all results in more network traffic. This
>means the nodes spend more time in waiting for work, as they
>spend in actually doing work. See "Amdahls Law".

That makes a lot of sense.

Thanks again.

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:27 CST