AW: Nvidia GPUs

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed Aug 21 2013 - 03:22:12 CDT

Hi Thomas,

 

the power of the gaming cards are usually a little better than the server
cards, cause not having ecc enabled or/and being overclocked.

 

Depending on how large your cluster will be, you will want to use
professional Tesla series cards instead of consumer GPUs for better
administration, warranty and cooling. If you will only have some machines,
like 4 or 5, it might be ok staying with the GTX. But if you will not only
use the cluster for NAMD, you may need further double precision support or
vram, also think of possible future needs of the cluster system. Remember
that the GTX cards will not tell you something about GPU utilization as well
as memory errors, so having more than 5 nodes, each having 2 or more GPUs,
you can have a lot of trouble identifying a broken gpu, additionally the GTX
are not meant for 24/7 usage. If you develop own codes, you may also want to
have the feature of "secondary bus reset" available to reset stuck gpus
instead of needing to reboot the whole machine, which is only available on
Tesla and Quadro, and I guess hanging gpus are common during coding and
testing. You won't also have information about the power readings of the
gpus with the GTX, what you might need.

 

So just think about the dimensions of your cluster and keep in mind that you
will need to administrate it and that you might want to have changing
requirements in the future or multiple projects having different needs.
Having ecc ram in the server and the gpus will additionally improve
stability, theoretically, at least you will be informed about broken ram or
gpus, saves you a lot of time, that you can spend in your scientific work.

 

One more note: Try to get the fastest node interconnect you can get. At
least it should be 40Gbit Infiniband. This will prevent you from being
disappointed, cause of not being able to bundle the computing power of you
cluster to run some really large jobs, or just for being able to call it "a"
supercomputer, not only a bunch of single machines ;)

 

Best wishes

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas C. Bishop
Gesendet: Dienstag, 20. August 2013 21:22
An: Thomas Albers
Cc: Namd Mailing List
Betreff: Re: namd-l: Nvidia GPUs

 

Thomas

There's a generic cuda comparison of GTX cards at
http://www.tomshardware.com/reviews/geforce-gtx-760-review-gk104,3542-19.htm
l
(different tom than me) that may be helpful.
wikipedia is also rather extensive

The earlier post from today is instructive w/r/to cost
"Re: namd-l: All CUDA devices are in prohibited mode, of compute capability
1.0, or otherwise unusable."

As for me. I use desktop GTX cuda for development,testing and analysis but
server cards (purchased and maintained by server admins ) for production
runs.

Tom

On 08/20/2013 01:27 PM, Thomas Albers wrote:

Dear List,
 
we are in the early stages of planning for a new cluster. Would
anyone on this list have experience (and timing results) to share with
recent Nvidia GPUs (GTX 660Ti, GTX 680, GTX 7xx)?
 
Regards,
Thomas
 

 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:36 CST