OpenCL and AMD GPUs

From: Aron Broom (broomsday_at_gmail.com)
Date: Tue Nov 22 2011 - 22:04:13 CST

I'd like to present an idea for a future feature for NAMD, support for
OpenCL. I think this is already being considered to some extent, but I
want to really show the full value of this.

My understanding is that at the moment GPU acceleration of non-bonded force
calculations only takes place using CUDA, and that in general this means
nVidia GPU cards. I searched through the mailing list archives and
couldn't find much discussion on this topic, although there was a post from
VMD developer John Stone (
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/14707.html)
suggesting that OpenCL needs to mature a bit more before it will be
standard between devices and easily implemented. I'd like to make an
argument for why it might be extremely worthwhile to get it working soon.

I've been recently running NAMD 2.8 on some nVidia M2070 boxes. The
objective thus far has been free energy determination using various
methods. At the moment, using an intel xeon 6-core CPU, I get 0.65 ns/day
for a 100k atom system using PME and an electrostatic cutoff of 14
angstroms with a timestep of 1fs and rigid waters (making all bonds rigid
and using SHAKE or RATTLE does not offer a real improvement in my case).
Adding in 2 nVidia M2070s to that mix increases performance to 2.54 ns/day,
a 4-fold improvement (1.94 ns/day with 1 M2070). This is quite nice, but
the cost of an M2070 or the newer M2090 is ~$3000.

Now, the consumer graphics cards that are based on the same Fermi chip
(i.e. nVidia GTX 580) have the same number of processor cores and should be
just as fast as the M2090, but only cost ~$500. Of course these consumer
cards work fine with NAMD as they are fully CUDA supported, but there are 3
problems, 2 of which are minor, while the 3rd is catastrophic. The first
is that the memory available on a GTX 580 is 1.5 GB compared with 6 GB on
the M2090. This doesn't actually matter that much for a large portion of
NAMD tasks. For instance my 100k system (which I think is about middle for
system sizes these days) uses less than 1GB of memory on the GPU. The
second problem is the lack of error correcting code (ECC) on the GTXs. I'm
going to contend that for NAMD this actually isn't that critical. NAMD
uses double precision floating point values in order to avoid accumulating
errors from rounding and this is quite critical for getting the right
answer from a long MD simulation. By contrast a flipped bit in memory will
cause a singular error in the simulation (resulting in an incorrect
velocity most likely), which, thanks to thermostats will be attenuated as
the simulation progresses rather than being built upon (but you could argue
against me here), and since we generally want to do the final production
simulations in replicate, it matters even less. The last problem, the real
one, is that nVidia, realizing that we might not want to spend 6 times the
price just for the extra memory and ECC, has artificially reduced the
double precision floating point performance of the consumer cards, from
being 1/2 of the single precision value in the M20xx series, to 1/8 in the
GTX cards (http://forums.nvidia.com/index.php?showtopic=164417). This
means that these cards are practically useless for NAMD (thereby forcing
high powered computing initiatives to purchase M20xx cards).

But what about the equivalent AMD cards? These have not been artificially
crippled. If we look at the manufacturer supplied peak performance specs,
an M2090 gives 666 GFlops of double precision performance (
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units).
By comparison a $350 Radeon HD 6970 gives 675 GFlops of double precision
performance, owing to their larger number of stream processors (
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units).
This menas that if you could run NAMD on an AMD card you could potentially
get the same performance for ~1/10th of the cost (and the Radeon HD 6970
has 2GB of memory, enough for pretty large systems). Unfortunately, the
AMD cards don't run CUDA. If NAMD could work with OpenCL we could be in a
position where everyone could have a desktop computer with the same
computing performance as a multiple thousand dollar supercomputer (at least
as far as molecular dynamics on NAMD were concerned).

Thoughts?

~Aron

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:56 CST