Re: CPU vs GPU Question

From: Bennion, Brian (bennion1_at_llnl.gov)
Date: Wed Dec 09 2020 - 10:21:54 CST

Hello Kelly,

I am not well versed on the workload distribution in namd3 so if there is anyone out there willing to correct me, I will say that you need 10 more nodes of the exact setup you are currently using to see the same throughput.

For AMBER, cross node GPU communication is not recommended, at least for amber18.

brian
________________________________
From: McGuire, Kelly <mcg05004_at_byui.edu>
Sent: Tuesday, December 8, 2020 11:14 PM
To: Bennion, Brian <bennion1_at_llnl.gov>; namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: Re: CPU vs GPU Question

Brian, follow up question. Did you mean at least 10 more nodes of CPUs only, or can I use multiple nodes of CPUs and GPUs? Is it true that GPUs don't work great across nodes?

Dr. Kelly L. McGuire

PhD Biophysics

Department of Physiology and Developmental Biology

Brigham Young University

LSB 3050

Provo, UT 84602

________________________________
From: Bennion, Brian <bennion1_at_llnl.gov>
Sent: Wednesday, December 2, 2020 4:05 PM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>; McGuire, Kelly <mcg05004_at_byui.edu>
Subject: Re: CPU vs GPU Question

Hello,
You will be needing to use at least 10 more nodes to approach the throughput you are accustomed to seeing. That is where MPI or Ifiniban will be playing the key role in the calculations.
Your sysadmin will be able to tell you if/what mpi exists on the cluster.

Brian

________________________________
From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of McGuire, Kelly <mcg05004_at_byui.edu>
Sent: Wednesday, December 2, 2020 2:51 PM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: namd-l: CPU vs GPU Question

In all of my simulations so far, I have used one node with 4xP100 GPUs and 24 CPU cores. I usually get ~40 ns/day with a system between 75,000 and 150,000 atoms. I am now trying to do a simulation that is 1.4 million atoms. Currently getting ~4 ns/day.

What is a better approach to speed up this simulation as atom number scales? More GPUs on one node or more CPUs and use multiple nodes? Where does MPI come into play here?

Dr. Kelly L. McGuire

PhD Biophysics

Department of Physiology and Developmental Biology

Brigham Young University

LSB 3050

Provo, UT 84602

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:15 CST