AW: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Dec 04 2014 - 01:08:09 CST

Next message: Jim Phillips: "RE: Re: alchemical free energy simulations with parmtop"
Previous message: Michel van der List: "RE: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?"
In reply to: Michel van der List: "RE: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

I've informed a little about AWS and it's quite likely that they actually
have a hpc network, but running usual IP traffic across it (like IPoIB - IP
over Infiniband).

So have fun with it ;)

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Michel van der List
> Gesendet: Mittwoch, 3. Dezember 2014 19:06
> An: Norman Geist; namd-l_at_ks.uiuc.edu
> Betreff: RE: namd-l: Running CUDA-enabled NAMD on multiple nodes
> without InfiniBand?
>
> Thank you for your reply.
>
> >> I've been working with a researcher here at Penn to try to
> >> get the CUDA-enabled NAMD running on multiple nodes (in this
> >> particular case using Amazon Web Services). So far I've been
> >> successful in getting the CUDA version (Linux-x86_64-multicore-
> >> CUDA) running as well as the TCP based multi-CPU/system
> >> (Linux-x86_64-TCP), using 2 systems with a total of 32 virtual
> >> CPUs (the instance types were g2.2xlarge and c3.4xlarge for
> >> those of you familiar with AWS).
> >
> >No, I'm not.
> >
> >> However, the CUDA version I downloaded does not supply the
> >> charmrun binary to use
> >
> >In this case you likely downloaded a multicore version which is
> >single-node.
>
> That is correct (Linux-x86_64-multicore-CUDA).
>
> >> with multiple nodes and trying to use the charmrun from the
> >> TCP based version did not work (it appears it starts the GPU
> >> versions on the specified nodes, but it does not progress and
> >> the binaries appear to be going into some sort of CPU loop.
> >>
> >> My questions:
> >> - Is this possible at all, or is the CUDA code
> >> tightly coupled with the infiniband code?
> >
> >Yes it's easily possible. No there's no force of Infiniband.
> >
> >> - If this is possible (and presumably I'd need to compile the
> >> code myself) what compilation options should I be looking
> >> at in the documentation?
> >
> >Simply download the source and see the notes.txt. You can almost
> >copy all the commands from the file into the console You will
> >also need CUDA 6.0 to be installed. You can stay with the gnu
> >compilers.
> >
> >If you need further help, let us know.
>
> I got all of that done and ended up with the executables. I'm
> working with the researcher to confirm that it's actually working
> correctly. Early indications are that using multiple GPUs at AWS
> does speed things up. I ran the apoa1.namd test and the time it
> took dropped from 48.5 to 29.7 seconds, which seems like
> a reasonable pay off. I haven't yet looked into whether I can get
> even better network throughput at AWS if I change some of my
> configuration. If there is any interest, I'm happy to share the
> rest of our experience.
>
> >PS: The reason why it is recommended to use Infiniband (or other
> >high bandwidth/low latency networks) with CUDA enabled NAMD is,
> >that the requirements to the inter process network scale with the
> >computing power of the nodes. As GPUs raise the computing power
> >by orders of magnitudes, Gbit network it highly insufficient and
> >you will NOT gain any speedup by using multiple nodes, maybe only
> >for very, very large systems.
> >
> >The easiest illustration is:
> >
> >The bigger the computing power per node for a given problem
> >(system size), the smaller the part problems to solve for the
> >nodes, the faster the part problems are done and results are sent
> >around, the more often new part problems need to be distributed
> >to the nodes. This all results in more network traffic. This
> >means the nodes spend more time in waiting for work, as they
> >spend in actually doing work. See "Amdahls Law".
>
> That makes a lot of sense.
>
> Thanks again.

Next message: Jim Phillips: "RE: Re: alchemical free energy simulations with parmtop"
Previous message: Michel van der List: "RE: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?"
In reply to: Michel van der List: "RE: Running CUDA-enabled NAMD on multiple nodes without InfiniBand?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:23:05 CST