Re: Technical specifications for the V100 and GTX gpu cards

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Mar 17 2020 - 06:32:59 CDT

On Tue, Mar 17, 2020 at 6:11 AM Souvik Sinha <souvik.sinha893_at_gmail.com>
wrote:

> Hi,
> I am confused to choose a GPU card related configuration compatible with
> the NAMD application.
>
> Some server models (e.g. DELL) specifically require at least two CPUs for
> V100 GP-GPUs to be functional. I wish to know whether such a technical
> limitation is also valid if the GTX1080 card is used instead of V100? I
> mean is it a general specification or specific to the build of a card.
>

it is more of a "political" choice. vendors, especially upper tier vendors,
will not support consumer grade graphics hardware for computing purposes
and outside of hardware designated for desktop use. if they would do so,
they would be subject of sanctions from the graphics hardware vendor (e.g.
loss of a "partner" status which translates into all kinds of competitive
disadvantages). also vendors like nvidia have been systematically and
deliberately limiting access to "enterprise features" in software and
driver support for consumer grade hardware (and in some case also the other
way around, but that is outside the scope of this discussion).

there are two typical major differences between consumer grade hardware
(GeForce) and computing grade hardware (Tesla). 1) Consumer grade GPU chips
are often just a mid-level version of the GPU chips used in Tesla (or
high-end Quadro) devices and are thus have much fewer double precision
capable floating-point units. 2) consumer grade GPU cards often have less
RAM on the GPU cards and do not use/support ECC style RAM. Also, enterprise
level hardware is more narrowly selected and more thoroughly tested.

neither of these will keep you from using consumer grade GPUs for NAMD
since it employs GPU compute kernels that significantly utilize
single-precision floating point math and because in most cases non-ECC RAM
will work just fine (people using Tesla GPUs often with the ECC function
disabled to boost performance and have a 20% increase in RAM), however,
there is no way of telling and you run a higher risk of having subtly
different results because of a weakness in some RAM cell resulting in
occasional bit flips. The same applies to main memory as well, but since
the memory is pushed more to the limit in high-end GPUs aimed at graphics
for games, the probability is higher, and for games it doesn't matter much
if a random single pixel has a slightly wrong color, which for the physics
of a simulation it would matter much more if sometimes numbers change
randomly.

given the large difference in prices, the risk-reward assessment is rather
difficult. consumer grade GPUs have very attractive pricing compared to
enterprise grade GPUs and are providing a very high amount of computational
power for simulations with NAMD. In recent years (and until the situation
has become more competitive due to AMD's latest more competitive enterprise
CPU generations), the situation for CPUs has been not so different,
especially gold and platinum type xeon CPUs have become extremely expensive
compared to their consumer grade counterparts.

One more thing: if a workstation is planned with one Intel Xeon Gold 5218
> and one GTX1080 or RTX2080, will it be fine to run NAMD?
>

you can run NAMD on pretty much any combination of nvidia GPUs and x86
hardware. however, there are many factors impacting performance and risk of
the hardware not being capable of handling continuous high computational
load. for many people a high-end gaming machine can be just as potent and
suitable as a significantly higher priced workstation with enterprise grade
hardware. in many cases the choice of which hardware to pick is not so much
governed by the price-performance numbers, but rather by the lack of
staffing to handle the increased workload from using consumer grade
hardware on a large scale. for a single machine, the difference is often
negligible, but once you have to operate 10s or 100s of consumer grade
machines for sustained computing, the cost of the hardware is of lesser
importance (it still hurts to have to spend so much more for rather little
gain on absolute performance), but the more hardware you have, the more the
additional effort to manage consumer grade hardware for HPC use will
manifest itself.
This is not limited to GPUs, but applies to all kinds of computing
hardware.

HTH,
    Axel.

p.s.: you should dig through the archives of this mailing list and you'll
find many discussions about how to run NAMD efficiently on all kinds of
GPU/CPU hardware choices. however, the choice of how high a risk you are
willing to go is yours. as mentioned above, the rewards can be quite high,
but you will then usually have to resolve technical issues yourself, if you
operate hardware combinations not sanctioned/supported by a vendor.

>
> Thank you.
>
> --
> Souvik Sinha
> Research Fellow
> Division of Bioinformatics
> Bose Institute, Kolkata
>
> Contact: 033 25693275
>

-- 
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST