Re: namd-l digest V1 #3135

From: Venkatareddy Dadireddy (venkatareddy_at_iisc.ac.in)
Date: Wed Sep 08 2021 - 02:48:33 CDT

Dear Prof. Josh,

Thank you for your explanation.
It seems CUDASOAIntegrate is not for GaMD
and some advance features in v3.0 are in development phase.
When I use more cpu cores (>1), the GPU load
goes down from ~13% to ~2%.
With CUDASOAIntegrate disabled in v3.0,
the performance of v3.0 is like v2.14 for cMD.
v3.0 is many folds faster with CUDASOAIntegration.

Thank you,
Venkat

________________________________
From: owner-namd-l-digest_at_ks.uiuc.edu <owner-namd-l-digest_at_ks.uiuc.edu> on behalf of namd-l digest <owner-namd-l-digest_at_ks.uiuc.edu>
Sent: Wednesday, September 8, 2021 11:37 AM
To: namd-l-digest_at_ks.uiuc.edu <namd-l-digest_at_ks.uiuc.edu>
Subject: namd-l digest V1 #3135

External Email

namd-l digest Wednesday, September 8 2021 Volume 01 : Number 3135

In this issue:

    namd-l: GaMD is slower on GPU compared to cMD
    Re: namd-l: GaMD is slower on GPU compared to cMD
    Re: namd-l: GaMD is slower on GPU compared to cMD

----------------------------------------------------------------------

Date: Mon, 6 Sep 2021 16:15:00 +0000
From: Venkatareddy Dadireddy <venkatareddy_at_iisc.ac.in>
Subject: namd-l: GaMD is slower on GPU compared to cMD

Hi,

I am new to NAMD and want to use GaMD module in NAMD v3.0 alpha 9.
I am following the protocol:
https://urldefense.com/v3/__https://miaolab.ku.edu/GaMD/tutorial_namd.html__;
!!DZ3fjg!uR1u8UbK1TFiJzQDS6AOXJL5n-T_Ix3kl2O2GXTKg3JCKqO2SpouKhK1oiwSvSAUeGI_
SzA$
I am using some tutorial test file (pdb) to get hands on GaMD/NAMD.
When I run conventional MD (cMD) on my test system, it runs quite faster
(160ns/day) on single GPU. But when I use the same system for GaMD,
it takes >3days with the following preparatory steps.

accelMDGcMDPrepSteps 200000
accelMDGcMDSteps 1000000
accelMDGEquiPrepSteps 200000
accelMDGEquiSteps 25000000
timestep 2.0 # fs

What I found is that 'CUDASOAintegrate on' accelerates the simulations
but in case of GaMD equilibration and production steps the
'CUDASOAintegrate on' is not supported.
In case of cMD, >90% GPU is used but in case of GaMD , only 13% of
GPU is utilized.
Please help me solving this problem.

Thank you,
Venkat

------------------------------

Date: Tue, 7 Sep 2021 10:08:00 -0400
From: Josh Vermaas <vermaasj_at_msu.edu>
Subject: Re: namd-l: GaMD is slower on GPU compared to cMD

Hi Venkat,

Welcome to the wonderful world of alpha software. :D The performance you
see for conventional MD on normal GPUs is because it follows a new code
path that has been GPU optimized, and the simulation data doesn't leave
the GPU. Not everything in NAMD works that way, and so sometimes you get
to use the old code path, where the GPU computes only some of the terms
needed, and timestep integration has to happen on the CPU. Even if you
use more than 1 CPU to help accelerate the integration steps, shuffling
data back and forth still limits simulation performance on modern
hardware. So you aren't doing anything wrong per se (you are using more
than 1 CPU, right?), but your performance is going to be much worse
unless you fit your algorithm to fit the CUDASOAIntegrate codepath.

- -Josh

On 9/6/21 12:15 PM, Venkatareddy Dadireddy wrote:
> Hi,
>
> I am new to NAMD and want to use GaMD module in NAMD v3.0 alpha 9.
> I am following the protocol:
>
https://urldefense.com/v3/__https://miaolab.ku.edu/GaMD/tutorial_namd.html__;
!!DZ3fjg!t0ZMZB01ovvii2oVGNN5oZSjFtMB-IYViaahbs6O8LRj8wWMPyinb8Afl0AIeDqQf4Kz
yNg$
>
<https://urldefense.com/v3/__https://miaolab.ku.edu/GaMD/tutorial_namd.html__
;!!DZ3fjg!uR1u8UbK1TFiJzQDS6AOXJL5n-T_Ix3kl2O2GXTKg3JCKqO2SpouKhK1oiwSvSAUeGI
_SzA$>
> I am using some tutorial test file (pdb) to get hands on GaMD/NAMD.
> When I run conventional MD (cMD) on my test system, it runs quite faster
> (160ns/day) on single GPU. But when I use the same system for GaMD,
> it takes >3days with the following preparatory steps.
>
> accelMDGcMDPrepSteps 200000
> accelMDGcMDSteps 1000000
> accelMDGEquiPrepSteps 200000
> accelMDGEquiSteps 25000000
> timestep 2.0 # fs
>
> What I found is that 'CUDASOAintegrate on' accelerates the simulations
> but in case of GaMD equilibration and production steps the
> 'CUDASOAintegrate on' is not supported.
> In case of cMD, >90% GPU is used but in case of GaMD , only 13% of
> GPU is utilized.
> Please help me solving this problem.
>
> Thank you,
> Venkat
>
>
>
>

- --
Josh Vermaas

vermaasj_at_msu.edu
Assistant Professor, Plant Research Laboratory and Biochemistry and Molecular
Biology
Michigan State University
https://urldefense.com/v3/__https://prl.natsci.msu.edu/people/faculty/josh-ve
rmaas/__;!!DZ3fjg!t0ZMZB01ovvii2oVGNN5oZSjFtMB-IYViaahbs6O8LRj8wWMPyinb8Afl0A
IeDqQvDlskOA$

------------------------------

Date: Tue, 7 Sep 2021 10:08:00 -0400
From: Josh Vermaas <vermaasj_at_msu.edu>
Subject: Re: namd-l: GaMD is slower on GPU compared to cMD

Hi Venkat,

Welcome to the wonderful world of alpha software. :D The performance you
see for conventional MD on normal GPUs is because it follows a new code
path that has been GPU optimized, and the simulation data doesn't leave
the GPU. Not everything in NAMD works that way, and so sometimes you get
to use the old code path, where the GPU computes only some of the terms
needed, and timestep integration has to happen on the CPU. Even if you
use more than 1 CPU to help accelerate the integration steps, shuffling
data back and forth still limits simulation performance on modern
hardware. So you aren't doing anything wrong per se (you are using more
than 1 CPU, right?), but your performance is going to be much worse
unless you fit your algorithm to fit the CUDASOAIntegrate codepath.

- -Josh

On 9/6/21 12:15 PM, Venkatareddy Dadireddy wrote:
> Hi,
>
> I am new to NAMD and want to use GaMD module in NAMD v3.0 alpha 9.
> I am following the protocol:
>
https://urldefense.com/v3/__https://miaolab.ku.edu/GaMD/tutorial_namd.html__;
!!DZ3fjg!uQcYvHNzuskaTzDCEXYY3I_ehMnzBgqURKRtox1v-A4cfy9NjLgNwgwj7UgAQ1MVUA$
>
<https://urldefense.com/v3/__https://miaolab.ku.edu/GaMD/tutorial_namd.html__
;!!DZ3fjg!uR1u8UbK1TFiJzQDS6AOXJL5n-T_Ix3kl2O2GXTKg3JCKqO2SpouKhK1oiwSvSAUeGI
_SzA$>
> I am using some tutorial test file (pdb) to get hands on GaMD/NAMD.
> When I run conventional MD (cMD) on my test system, it runs quite faster
> (160ns/day) on single GPU. But when I use the same system for GaMD,
> it takes >3days with the following preparatory steps.
>
> accelMDGcMDPrepSteps 200000
> accelMDGcMDSteps 1000000
> accelMDGEquiPrepSteps 200000
> accelMDGEquiSteps 25000000
> timestep 2.0 # fs
>
> What I found is that 'CUDASOAintegrate on' accelerates the simulations
> but in case of GaMD equilibration and production steps the
> 'CUDASOAintegrate on' is not supported.
> In case of cMD, >90% GPU is used but in case of GaMD , only 13% of
> GPU is utilized.
> Please help me solving this problem.
>
> Thank you,
> Venkat
>
>
>
>

- --
Josh Vermaas

vermaasj_at_msu.edu
Assistant Professor, Plant Research Laboratory and Biochemistry and Molecular
Biology
Michigan State University
https://urldefense.com/v3/__https://prl.natsci.msu.edu/people/faculty/josh-ve
rmaas/__;!!DZ3fjg!uQcYvHNzuskaTzDCEXYY3I_ehMnzBgqURKRtox1v-A4cfy9NjLgNwgwj7Uh
Bg2FMbw$

------------------------------

End of namd-l digest V1 #3135
*****************************

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST