Re: Compiling namd2 in 2 GPU 2 CPU workstation

From: Vermaas, Josh (vermaasj_at_msu.edu)
Date: Tue Mar 23 2021 - 08:00:52 CDT

If you are running standard equilibrium MD, the only thing that is likely to give you markedly different performance is using NAMD 3.0a9 with CUDASOAIntegrate turned on, ideally with a NVLINK connection between the two Quadros.

-Josh

On 3/23/21 1:28 AM, Suryanarayanan Chandrasekaran wrote:
I restarted the system, I think it worked attached are the log files of the given below commands.

~/NAMD_Git-2021-03-03_Linux-x86_64-multicore-CUDA/namd2 +p18 +idlepoll step5_production4.inp +setcpuaffinity +isomalloc_sync | tee stmv1.log
~/NAMD_Git-2021-03-03_Linux-x86_64-multicore-CUDA/namd2 +p18 step5_production4.inp | tee stmv.log

Please check once and let me know. If you have anytips to improve the calculation speed, plz let me know

Thanks,
Surya

On Mon, Mar 22, 2021 at 7:17 PM Vermaas, Josh <vermaasj_at_msu.edu<mailto:vermaasj_at_msu.edu>> wrote:
I’ve run the multicore method with up to 6 GPUs at once. 2 in principle is just fine. I can confirm that the 2.14 release will run just fine, and I’d start there when trying to figure out what is wrong. If 2.14 is giving you the same error message, I’d try rebooting the workstation, as I have seen instances where that solves weird hardware glitches. If 2.14 works, but the git version doesn’t, then it’s a bug in the development version.

The way I’d test NAMD on your hardware would be something like:

namd2 +p18 stmv.namd | tee stmv.log

-Josh

From: Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca<mailto:suryanar_at_ualberta.ca>>
Date: Monday, March 22, 2021 at 12:26 AM
To: "Vermaas, Josh" <vermaasj_at_msu.edu<mailto:vermaasj_at_msu.edu>>, "namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>" <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Subject: Re: namd-l: Compiling namd2 in 2 GPU 2 CPU workstation

I don't think this is a driver issue, Given below is the output from nvidia-smi. In the installation manual, it is mentioned only one GPU can be used in Multicore Cuda in a single node. Probably mine is a workstation, so the problem might be identifying 2 GPU. I want to use both the two CPUs and two GPUs, in the workstation.

$ nvidia-smi
Mon Mar 22 09:49:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 37C P8 18W / 230W | 1MiB / 16125MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 5000 Off | 00000000:73:00.0 On | Off |
| 33% 40C P8 18W / 230W | 652MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2584 G /usr/lib/xorg/Xorg 0MiB |
| 0 N/A N/A 2911 G /usr/bin/gnome-shell 0MiB |
| 0 N/A N/A 8657 G ...gAAAAAAAAA --shared-files 0MiB |
| 0 N/A N/A 12099 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 12615 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 13032 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 13527 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 25049 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 2584 G /usr/lib/xorg/Xorg 484MiB |
| 1 N/A N/A 2911 G /usr/bin/gnome-shell 86MiB |
| 1 N/A N/A 8657 G ...gAAAAAAAAA --shared-files 19MiB |
| 1 N/A N/A 12099 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 12615 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 13032 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 13527 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 25049 G /usr/lib/firefox/firefox 44MiB |
+-----------------------------------------------------------------------------+

On Sun, Mar 21, 2021 at 4:42 AM Vermaas, Josh <vermaasj_at_msu.edu<mailto:vermaasj_at_msu.edu>> wrote:
What is the output of nvidia-smi? NAMD isn’t able to ask CUDA how many GPUs there are. Sometimes this is a driver issue, and sometimes this is just a compatibility issue between your hardware and what NAMD expects.

-Josh

From: <owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu>> on behalf of Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca<mailto:suryanar_at_ualberta.ca>>
Reply-To: "namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>" <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>, Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca<mailto:suryanar_at_ualberta.ca>>
Date: Saturday, March 20, 2021 at 6:09 AM
To: "namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>" <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Subject: namd-l: Compiling namd2 in 2 GPU 2 CPU workstation

Hello,
          I have a workstation with 2 CPU and 2 GPU. I downloaded NAMD_Git-2021-03-03_Linux-x86_64-multicore-CUDA and tried to compile it with the below options, I want to use all GPU for a single CPU or both the CPU and GPU in the workstation.

1) namd2 +p18 +idlepoll stmv.namd +setcpuaffinity +isomalloc_sync
2) namd2 +idlepoll +p4 +devices 0,1 <configfile>

But getting the same error

Charm++> Running on 1 hosts (2 sockets x 10 cores x 2 PUs = 40-way SMP)
Charm++> cpu topology info is gathered in 0.006 seconds.
Info: Built with CUDA version 10010
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
 on Pe 0 (surya): unknown error
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
 on Pe 0 (surya): unknown error

Thanks,

Surya

--
Josh Vermaas
Assistant Professor, MSU-DOE Plant Research Lab and Department of Biochemisty and Molecular Biology
vermaasj_at_msu.edu<mailto:vermaasj_at_msu.edu>
https://urldefense.com/v3/__https://prl.natsci.msu.edu/people/faculty/josh-vermaas/__;!!DZ3fjg!u8uwow0Q8_aauHqwgMuMa59oggiyhmBXFEWWpmVq8qlrCW7Dq-P29j2zzLsgqJp7eA$ 

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST