From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Oct 02 2009 - 12:51:26 CDT
On Thu, 2009-10-01 at 18:27 -0400, Biff Forbush wrote:
> Hi All,
> In the coming weeks I plan to assemble a single node NAMD machine
> for short pre-production runs, equilibrations, VMD, etc., for membrane
> proteins (100k-200k atoms). The aim is to optimize performance in a
> single node system, without a strict limit on cost. I have followed the
> NAMD/VMD CUDA discussions with great interest, and it seems clear that
> given the not-quite-mature state of GPU utilization and stability, it is
> prudent to optimize CPU power, at the same time as providing for GPU
> capability, am I right?
yes and no. if you want the fastest single node machine ever, yes,
but keep in mind that extreme hardware always carries the risk of
being extremely sensitive to crashing and overloading components.
i would rather focus on getting a good balance on components that
have the main performance impact and otherwise crank it down a couple
of notches to have a more reliable machine. if jobs crash often,
then the faster processors are not worth that much.
the key components for good GPU performance are the i/o bandwidth
of the mainboard and memory bandwidth of the CPU. at the high end,
cpu performance is always limited by the memory bandwidth, so the
higher you go with the clock rate, the less you get out of it unless
you are treating problems small enough to fit entirely into the
CPU cache. so rather then squeezing the last bit of performance
out of clock rate, you may also consider how the memory subsystem
can be optimized.
> For about $7K or so I am planning:
> Dual Xeon W5590 (3.3 GHz, calculating that the the incremental system
> cost/Hz is actually nearly a constant, so you linearly get what you pay
this is a very optimistic assumption. i would expect more like an
exponential increase of the price over performance at the high end.
> Tyan s7205 mobo. As far as I have been able to find after reading a
> lot of specs, this is the only mobo with (4x) PCI-E.2x16 and dual Xeons
> -- please advise if there is other/better.
> Up to 4x GTX-295s, see below.
i don't know any details about specific mainboards, but
if you want a compute machine, you should make sure that you
have an additional graphics chip connected somewhere that
you can use for (textmode) output and that this does not
bring down the performance on any of the other PCIe busses.
ideally, it would be something from a different vendor that
is routed independently.
> I assume that with this weeks' announcement of Fermi, now is not the
> time to rush out and buy GTX-295s. Is it reasonable to guess that when
> the Fermi devices become available (guesses are 2-4 months?), they will
> be plug compatable with current CUDA software? It sounds as though
i would not hold my breath. from what i heard here at the GTC it seems
to be easier to validate the compute functionality than the graphics
functionality, and thus i would conclude that Tesla type devices with
no graphics output will come out first. you also should consider
the current price ratio between a GTX 295 and a Tesla C1060, and then
you can estimate how much a new generation of hardware with at least
double the theoretical performance will cost you.
> they may give rather dramatic improvement right out of the box? Is it
> also possible that the improved architecture will allow additional
> utilization in NAMD...?
it is not quite obvious to me, how a machine like you describe it
will hold up to the massive memory and bus bandwidth demands. with
the next-gen hardware those demands can only go up. how much performance
you will see and whether 4 GTX-295, 4 GTX-285, or 4 C1060 are the better
solution depends a lot on what you are going to do with the machine.
NAMD for example is able to schedule GPU kernels from multiple CPU tasks
into the same GPU, and since not all compute kernels are ported to CUDA,
they will stay idle from some time. with the GTX-295 you will be able
to fit more GPUs into one case, but at the same time those GPUs will
be slower (a GTX-295 is effectively two GTX-260 glue together and
sharing a PCIe slot through a bridge) than the fastest single GPU
in at GTX-285.
ultimately. there is nothing but running realistic benchmarks, that
can tell you whether you will get what you are looking for or not.
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com Institute for Computational Molecular Science College of Science and Technology Temple University, Philadelphia PA, USA.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:20 CST