Re: GPU cluster

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Jun 19 2012 - 14:40:38 CDT

Next message: Robert Johnson: "ABF/Steered MD for DNA Hybridization on Carbon Nanotubes"
Previous message: Matthew B. Roark: "GPU cluster"
In reply to: Matthew B. Roark: "GPU cluster"
Next in thread: Norman Geist: "AW: GPU cluster"
Reply: Norman Geist: "AW: GPU cluster"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On Tue, Jun 19, 2012 at 3:07 PM, Matthew B. Roark <Roarkma_at_wabash.edu> wrote:
>
> I currently use CHARMM on several Rocks clusters and I am looking to try out GPGPU. I am looking into buying either a small GPU cluster or a few stand-alone workstations to use CUDA enabled NAMD. I wanted to see what people are using and what people suggest.
>
> (1) My main concern is that I need to have something that will positively work with NAMD. Is there any hardware or vendors I should stay away from? Are there vendors with "out-of-the-box' compatibility?

there is no simple answer to this. for as long as you get recent and
capable enough nvidia hardware. there is compatibility, but it can wildly
differ how well it will work for your specific application.

most vendors go with what nvidia recommends, and particularly the sales
people that you get to deal with don't know much, if anything at all. what
nvidia and/or vendors recommend is not always the best choice, particularly
when you are on a budget. but at the same time, what is the best choice
depends on how much effort you want to invest by yourself in figuring out
what is best for your purpose and how well you are able to push vendors
to offer you something that that doesn't make them as much money or
is going against (inofficial?) agreements they have with nvidia.

the biggest question is whether you want to go with consumer grade
hardware or "professional" tesla GPUs. classical MD won't benefit
as much from the features of the tesla hardware as other applications.
and GeForce GTX 580 cards provide an incredible price performance
ratio compared to Tesla C2075 GPUs. OTOH, you don't have ECC,
more thoroughly tested hardware, and the better warranty.

from what has percolated through this mailing list over the last years,
it seems that consumer grade hardware is best used in a workstation
type environment, and tesla type hardware makes most sense in a
cluster environment (particularly with the passively cooled M series)

> (2) How does scaling and efficiency work across multiple GPUs in the same server? That is, how many GPUs can a server really make good use of? I plan on testing with an 80k atom simulation.

that depends on the mainboard chipset. most can handle two GPUs well.
a dual intel tylersburg chipset mainboard (supermicro has one) can handle
up to 4 GPUs very well. those mainboards with two 6-core CPUs is probably
the best choice for a single/multiple workstation setup, and that is probably
also the limit of how far you can scale NAMD well for your input.

because memory and PCI-bus bandwidth is very important, you should
stay away from mainboards with integrated "PCI-e bridges" (with those
one can have 8 16-lane PCIe slots, but two cards have to share the
bandwidth and latency is increased). the same goes for dual GPU cards.

> (3) How much CPU power do I need to make use of multiple GPUs? Will 8 or 16 cores suffice?

NAMD can overlap computational work that is not GPU accelerated
with GPU kernels and run them concurrently. NAMD also can attach
multiple threads to the same GPU and thus increase GPU utilization.
however, how well this works and how many threads per CPU depends
on memory bandwidth, computational complexity of the model and
size of the data set. in may cases, the optimum seems to be around
two to three CPU cores per GPU. it sometimes may be best to leave
CPU cores idle to run the GPUs more efficiently. using more threads
per GPU can increase utilization, but also increases overhead. so
there is an optimum. clock rate of the CPU is usually less important
than a good overall i/o bandwidth.

a final comment. try to resist the urge of purchasing the very latest
(kepler) hardware. vendors will push it, but applications have not yet
caught up (it can take a few years sometimes), so you won't benefit.
if you want something that definitely works, it is always a good idea
to stick with tried and tested hardware that is closer to its end-of-life
than to its introduction.

HTH,
axel.
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science and Technology
Temple University, Philadelphia PA, USA.

Next message: Robert Johnson: "ABF/Steered MD for DNA Hybridization on Carbon Nanotubes"
Previous message: Matthew B. Roark: "GPU cluster"
In reply to: Matthew B. Roark: "GPU cluster"
Next in thread: Norman Geist: "AW: GPU cluster"
Reply: Norman Geist: "AW: GPU cluster"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:09 CST