AW: GPU cluster

From: Norman Geist (
Date: Thu Jun 21 2012 - 00:45:48 CDT


to run across multiple nodes you will also need a highspeed network. My
cluster 3 nodes 36 cores 6 tesla c2050 does barely scale ok with SDR
Infiniband (10Gbit/s).

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: [] Im
> Auftrag von Axel Kohlmeyer
> Gesendet: Dienstag, 19. Juni 2012 21:41
> An: Matthew B. Roark
> Cc:
> Betreff: Re: namd-l: GPU cluster
> On Tue, Jun 19, 2012 at 3:07 PM, Matthew B. Roark <>
> wrote:
> >
> > I currently use CHARMM on several Rocks clusters and I am looking to
> try out GPGPU.  I am looking into buying either a small GPU cluster or
> a few stand-alone workstations to use CUDA enabled NAMD.  I wanted to
> see what people are using and what people suggest.
> >
> > (1)  My main concern is that I need to have something that will
> positively work with NAMD.  Is there any hardware or vendors I should
> stay away from?  Are there vendors with "out-of-the-box' compatibility?
> there is no simple answer to this. for as long as you get recent and
> capable enough nvidia hardware. there is compatibility, but it can
> wildly
> differ how well it will work for your specific application.
> most vendors go with what nvidia recommends, and particularly the sales
> people that you get to deal with don't know much, if anything at all.
> what
> nvidia and/or vendors recommend is not always the best choice,
> particularly
> when you are on a budget. but at the same time, what is the best choice
> depends on how much effort you want to invest by yourself in figuring
> out
> what is best for your purpose and how well you are able to push vendors
> to offer you something that that doesn't make them as much money or
> is going against (inofficial?) agreements they have with nvidia.
> the biggest question is whether you want to go with consumer grade
> hardware or "professional" tesla GPUs. classical MD won't benefit
> as much from the features of the tesla hardware as other applications.
> and GeForce GTX 580 cards provide an incredible price performance
> ratio compared to Tesla C2075 GPUs. OTOH, you don't have ECC,
> more thoroughly tested hardware, and the better warranty.
> from what has percolated through this mailing list over the last years,
> it seems that consumer grade hardware is best used in a workstation
> type environment, and tesla type hardware makes most sense in a
> cluster environment (particularly with the passively cooled M series)
> > (2)  How does scaling and efficiency work across multiple GPUs in the
> same server?  That is, how many GPUs can a server really make good use
> of?  I plan on testing with an 80k atom simulation.
> that depends on the mainboard chipset. most can handle two GPUs well.
> a dual intel tylersburg chipset mainboard (supermicro has one) can
> handle
> up to 4 GPUs very well. those mainboards with two 6-core CPUs is
> probably
> the best choice for a single/multiple workstation setup, and that is
> probably
> also the limit of how far you can scale NAMD well for your input.
> because memory and PCI-bus bandwidth is very important, you should
> stay away from mainboards with integrated "PCI-e bridges" (with those
> one can have 8 16-lane PCIe slots, but two cards have to share the
> bandwidth and latency is increased). the same goes for dual GPU cards.
> > (3)  How much CPU power do I need to make use of multiple GPUs?  Will
> 8 or 16 cores suffice?
> NAMD can overlap computational work that is not GPU accelerated
> with GPU kernels and run them concurrently. NAMD also can attach
> multiple threads to the same GPU and thus increase GPU utilization.
> however, how well this works and how many threads per CPU depends
> on memory bandwidth, computational complexity of the model and
> size of the data set. in may cases, the optimum seems to be around
> two to three CPU cores per GPU. it sometimes may be best to leave
> CPU cores idle to run the GPUs more efficiently. using more threads
> per GPU can increase utilization, but also increases overhead. so
> there is an optimum. clock rate of the CPU is usually less important
> than a good overall i/o bandwidth.
> a final comment. try to resist the urge of purchasing the very latest
> (kepler) hardware. vendors will push it, but applications have not yet
> caught up (it can take a few years sometimes), so you won't benefit.
> if you want something that definitely works, it is always a good idea
> to stick with tried and tested hardware that is closer to its end-of-
> life
> than to its introduction.
> HTH,
> axel.
> >
> >
> --
> Dr. Axel Kohlmeyer
> College of Science and Technology
> Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:10 CST