Running CUDA-enabled NAMD on multiple nodes without InfiniBand?

From: Michel van der List (vmic_at_isc.upenn.edu)
Date: Mon Dec 01 2014 - 08:40:46 CST

All,

I've been working with a researcher here at Penn to try to get the CUDA-enabled NAMD
running on multiple nodes (in this particular case using Amazon Web Services). So far I've
been successful in getting the CUDA version (Linux-x86_64-multicore-CUDA) running
as well as the TCP based multi-CPU/system (Linux-x86_64-TCP), using 2 systems with a
total of 32 virtual CPUs (the instance types were g2.2xlarge and c3.4xlarge for those of
you familiar with AWS).

However, the CUDA version I downloaded does not supply the charmrun binary to use
with multiple nodes and trying to use the charmrun from the TCP based version did not
work (it appears it starts the GPU versions on the specified nodes, but it does not
progress and the binaries appear to be going into some sort of CPU loop.

My questions:
- Is this possible at all, or is the CUDA code tightly coupled with the infiniband code?
- If this is possible (and presumably I'd need to compile the code myself) what compilation
  options should I be looking at in the documentation?

Please note that I know nothing about NAMD (just enough to get the benchmarks to run),
but I'm very familiar with how to compile/run software.

Thanks!

--
Michel van der List
vmic_at_isc.upenn.edu
Information Systems and Computing
(o) +1 215 898 5790

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:23:05 CST