Re: Configuration of new HPC Cluster for NAMD and GROMACS

From: Vogel, Alexander (Alexander.Vogel_at_medizin.uni-leipzig.de)
Date: Tue Aug 01 2017 - 05:45:57 CDT

Dear Josh,

thanks a lot for your answer...that was really helpful. It gives me a very good starting point for estimating the amount of CPU needed to fuel a GPU.

Also I want to say that the performance you get is really impressive...with a 1fs timestep that are about 8.5ns/day. I found apoa1 benchmarks for NAMD 2.12 (https://hpc.nih.gov/apps/namd/) where they get just 44% better performance with a system that is at least twice as fast (28 cores vs. your 10 cores, 1 K80 graphics card (8.7 TFLOPS) vs. your Quadro M5000 (4.3 TFLOPS)). So NAMD 2.13 seems to again significantly improve performance on GPU clusters.

But it is good to know that CPU still is important...we will try to get as much as possible with our budget. In particular since we also want to use GROMACS which seems to be even more CPU bound from the benchmarks I found and your experience.

The third question now is also answered by the benchmarks I found (see link above). They tried to scale the apoa1 benchmark from 1 to 4 nodes and it actually decreased in performance despite using Infiniband FDR. So we are probably going to drop it since our nodes are going to be even faster than theirs.

The fourth question is also what I expected. I was just told by the manufacturer to not go below 32GB because then the savings are rather small and the memory bandwidth would be reduced because not all memory channels would be occupied.

So the only question remaining for me is to use 4 GPUs and get some more nodes or 8 GPUs for less but faster nodes. It really boils down to the question if NAMD scales to 8 GPUs with the CPU power we have. So if anybody happens to know that answer I would be very grateful.

Alexander

> Hi Alexander,
>
> On #1, my recollection is that NAMD is still CPU bound. For the apoa1 benchmark on my own personal desktop (Quadro M5000, E5-2687W) using aCVS build that came after the last of Antti-Pekka's commits to the GPU code, this is what I see in terms of performance based on the number of cores I throw at it:
> Cores, s/step
> 1, 0.055
> 2, 0.028
> 4, 0.015
> 6, 0.010;#Performance stagnates. Probably GPU bound at this point
> 8, 0.010
> 10, 0.010
>
> To me, this seems to suggest that each GPU needs between 4 and 8 CPUs just to deal with the work left on the CPU to prevent the GPU from beingidle all the time, after which throwing more CPU's at the problem doesn't help since it is GPU-bound. On paper, the ratio you are proposing doesn't seem crazy to me (4ish cores per GPU), but unlikeGROMACS, NAMD doesn't warn you if you don't have a good balance between CPU work and GPU work, and my hardware != your proposed hardware, so just take this as what it is: a single datapoint. :)
>
> Also, keep in mind that there are some parts of the NAMD code that CANNOT be run on GPUs (like the alchemical stuff). Honestly, the EPYC sounds like a great place to save money, since then you could feasibly go down to 1-socket motherboards instead of 2-socket ones, in addition to the CPU cost savings.
>
> 2. At a certain point, GROMACS has the same problems as NAMD does, in that there are some things that the CPU does on its own that limit the impact of adding more GPUs. I've never done extensive testing to figure out where that point is unfortunately. Usually though on my desktop the GPU is idle more often, so that would point to fewer GPUs in favor of more CPU threads. AMBER has the opposite problem, and you typically want a ton of GPUs per CPU.
>
> 3. I mean, technically the best efficiency always comes from using a single node at a time, forgetting about any fast interconnects. However,if you need an answer for a project on a tight deadline, I think you'll be kicking yourself for not having the flexibility of just throwing more processors at the problem until it is solved. Gigabit ethernet just doesn't cut it in terms of latency.
>
> 4. 16GB is probably also enough of overkill, so long as you stay away from really big systems, and don't expect to do analysis on the cluster.
>
> -Josh

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:30 CST