Re: Re: CPU vs GPU Question

From: Rafael Bernardi (rcbernardi_at_auburn.edu)
Date: Wed Dec 09 2020 - 23:26:20 CST

Hello Brian,

If what you want is a node to run NAMD3 don’t worry so much about buying CPUs. To be honest I would just buy any AMD EPYC because of the motherboards with PCIe 4.0 support.

I just got a machine with 2x EPYC 7302, so 32 cores in total. I was running some benchmarks today, and using only 2 CPU cores, 2 RTX 3090 with NVLINK, I can get:

STMV (1.06M atoms)

4fs timestep, NVE, and Amber-like parameters (8A cutoff, etc) = 76 ns/day

4fs timestep, NPT, and Amber-like parameters (8A cutoff, etc) = 52 ns/day

4fs timestep, NPT, with regular (safe) parameters (12A cutoff, etc) = 30 ns/day

2fs timestep, NPT, with regular (safe) parameters (12A cutoff, etc) = 16 ns/day

I hope these numbers give you an idea about what you can do with a new local machine. The other nice thing is that you can have a machine with 4x 3090s, installed as 2 pairs of NVLINKed 3090s. That way you can run two simulations at these performances at the same time.

Also, I wouldn’t use any of the Amber-like parameters with CHARMM force field in a production dynamics. That was done just as good way to compare the speed of NAMD vs AMBER (software). If you want to read more about that JC Gumbart wrote a great paper testing these parameters for a membrane system.

Best

Rafael


……………………………………………………………………...
Rafael C. Bernardi
Biophysics Cluster - Department of Physics at Auburn University
NIH Center for Macromolecular Modeling & Bioinformatics
rcbernardi_at_auburn.edu<mailto:rcbernardi_at_auburn.edu>
rcbernardi_at_ks.uiuc.edu<mailto:rcbernardi_at_ks.uiuc.edu>
www.ks.uiuc.edu/~rcbernardi<http://www.ks.uiuc.edu/~rcbernardi>
+1 (334) 844-4393





From: <owner-namd-l_at_ks.uiuc.edu> on behalf of JC Gumbart <gumbart_at_physics.gatech.edu>
Reply-To: "namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, JC Gumbart <gumbart_at_physics.gatech.edu>
Date: Wednesday, December 9, 2020 at 5:25 PM
To: "McGuire, Kelly" <mcg05004_at_byui.edu>
Cc: "namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, "Bennion, Brian" <bennion1_at_llnl.gov>
Subject: Re: namd-l: Re: CPU vs GPU Question

I don’t have those cards available myself to compare against, but I wouldn’t be surprised if that was the best it could do. If you can switch to NAMD3 though (depending on your specific needs), you could run four copies of your system and probably get comparable performance for each.

Best,
JC


On Dec 9, 2020, at 5:15 PM, McGuire, Kelly <mcg05004_at_byui.edu<mailto:mcg05004_at_byui.edu>> wrote:

Hi JC, I've been using NAMD2 with 4 GPUs. Would you expect that 4x2080ti's and 24 processors on one node would only be able to do about 9 ns/day for the 1.4 million atom system?

Dr. Kelly L. McGuire
PhD Biophysics
Department of Physiology and Developmental Biology
Brigham Young University
LSB 3050
Provo, UT 84602

________________________________
From: Gumbart, JC <gumbart_at_physics.gatech.edu<mailto:gumbart_at_physics.gatech.edu>>
Sent: Wednesday, December 9, 2020 1:56 PM
To: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>; Bennion, Brian <bennion1_at_llnl.gov<mailto:bennion1_at_llnl.gov>>
Cc: McGuire, Kelly <mcg05004_at_byui.edu<mailto:mcg05004_at_byui.edu>>
Subject: Re: namd-l: Re: CPU vs GPU Question

For reference, for a 1.45-million atom system with NAMD3, I get 9 ns/day on a V100 with 4-fs time steps and HMR. I recall P100s being ~1/2 the speed, so you’re not too far off my expectation.

I haven’t tried over multiple GPUs or nodes, but I find it’s usually easier to just run multiple copies, one GPU each. Shorter runs but better statistics.

Best,
JC


On Dec 9, 2020, at 11:21 AM, Bennion, Brian <Bennion1_at_llnl.gov<mailto:Bennion1_at_llnl.gov>> wrote:

Hello Kelly,

I am not well versed on the workload distribution in namd3 so if there is anyone out there willing to correct me, I will say that you need 10 more nodes of the exact setup you are currently using to see the same throughput.

For AMBER, cross node GPU communication is not recommended, at least for amber18.

brian
________________________________
From: McGuire, Kelly <mcg05004_at_byui.edu<mailto:mcg05004_at_byui.edu>>
Sent: Tuesday, December 8, 2020 11:14 PM
To: Bennion, Brian <bennion1_at_llnl.gov<mailto:bennion1_at_llnl.gov>>; namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Subject: Re: CPU vs GPU Question

Brian, follow up question. Did you mean at least 10 more nodes of CPUs only, or can I use multiple nodes of CPUs and GPUs? Is it true that GPUs don't work great across nodes?

Dr. Kelly L. McGuire
PhD Biophysics
Department of Physiology and Developmental Biology
Brigham Young University
LSB 3050
Provo, UT 84602

________________________________
From: Bennion, Brian <bennion1_at_llnl.gov<mailto:bennion1_at_llnl.gov>>
Sent: Wednesday, December 2, 2020 4:05 PM
To: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>; McGuire, Kelly <mcg05004_at_byui.edu<mailto:mcg05004_at_byui.edu>>
Subject: Re: CPU vs GPU Question

Hello,
You will be needing to use at least 10 more nodes to approach the throughput you are accustomed to seeing. That is where MPI or Ifiniban will be playing the key role in the calculations.
Your sysadmin will be able to tell you if/what mpi exists on the cluster.

Brian

________________________________
From: owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu> <owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu>> on behalf of McGuire, Kelly <mcg05004_at_byui.edu<mailto:mcg05004_at_byui.edu>>
Sent: Wednesday, December 2, 2020 2:51 PM
To: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Subject: namd-l: CPU vs GPU Question

In all of my simulations so far, I have used one node with 4xP100 GPUs and 24 CPU cores. I usually get ~40 ns/day with a system between 75,000 and 150,000 atoms. I am now trying to do a simulation that is 1.4 million atoms. Currently getting ~4 ns/day.

What is a better approach to speed up this simulation as atom number scales? More GPUs on one node or more CPUs and use multiple nodes? Where does MPI come into play here?

Dr. Kelly L. McGuire
PhD Biophysics
Department of Physiology and Developmental Biology
Brigham Young University
LSB 3050
Provo, UT 84602

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST