Re: Compiling namd2 in 2 GPU 2 CPU workstation

From: Vermaas, Josh (vermaasj_at_msu.edu)
Date: Mon Mar 22 2021 - 08:47:29 CDT

I’ve run the multicore method with up to 6 GPUs at once. 2 in principle is just fine. I can confirm that the 2.14 release will run just fine, and I’d start there when trying to figure out what is wrong. If 2.14 is giving you the same error message, I’d try rebooting the workstation, as I have seen instances where that solves weird hardware glitches. If 2.14 works, but the git version doesn’t, then it’s a bug in the development version.

The way I’d test NAMD on your hardware would be something like:

namd2 +p18 stmv.namd | tee stmv.log

-Josh

From: Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca>
Date: Monday, March 22, 2021 at 12:26 AM
To: "Vermaas, Josh" <vermaasj_at_msu.edu>, "namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: Compiling namd2 in 2 GPU 2 CPU workstation

I don't think this is a driver issue, Given below is the output from nvidia-smi. In the installation manual, it is mentioned only one GPU can be used in Multicore Cuda in a single node. Probably mine is a workstation, so the problem might be identifying 2 GPU. I want to use both the two CPUs and two GPUs, in the workstation.

$ nvidia-smi
Mon Mar 22 09:49:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 37C P8 18W / 230W | 1MiB / 16125MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 5000 Off | 00000000:73:00.0 On | Off |
| 33% 40C P8 18W / 230W | 652MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2584 G /usr/lib/xorg/Xorg 0MiB |
| 0 N/A N/A 2911 G /usr/bin/gnome-shell 0MiB |
| 0 N/A N/A 8657 G ...gAAAAAAAAA --shared-files 0MiB |
| 0 N/A N/A 12099 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 12615 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 13032 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 13527 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 25049 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 2584 G /usr/lib/xorg/Xorg 484MiB |
| 1 N/A N/A 2911 G /usr/bin/gnome-shell 86MiB |
| 1 N/A N/A 8657 G ...gAAAAAAAAA --shared-files 19MiB |
| 1 N/A N/A 12099 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 12615 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 13032 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 13527 G /usr/lib/firefox/firefox 2MiB |
| 1 N/A N/A 25049 G /usr/lib/firefox/firefox 44MiB |
+-----------------------------------------------------------------------------+

On Sun, Mar 21, 2021 at 4:42 AM Vermaas, Josh <vermaasj_at_msu.edu<mailto:vermaasj_at_msu.edu>> wrote:
What is the output of nvidia-smi? NAMD isn’t able to ask CUDA how many GPUs there are. Sometimes this is a driver issue, and sometimes this is just a compatibility issue between your hardware and what NAMD expects.

-Josh

From: <owner-namd-l_at_ks.uiuc.edu<mailto:owner-namd-l_at_ks.uiuc.edu>> on behalf of Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca<mailto:suryanar_at_ualberta.ca>>
Reply-To: "namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>" <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>, Suryanarayanan Chandrasekaran <suryanar_at_ualberta.ca<mailto:suryanar_at_ualberta.ca>>
Date: Saturday, March 20, 2021 at 6:09 AM
To: "namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>" <namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu>>
Subject: namd-l: Compiling namd2 in 2 GPU 2 CPU workstation

Hello,
          I have a workstation with 2 CPU and 2 GPU. I downloaded NAMD_Git-2021-03-03_Linux-x86_64-multicore-CUDA and tried to compile it with the below options, I want to use all GPU for a single CPU or both the CPU and GPU in the workstation.

1) namd2 +p18 +idlepoll stmv.namd +setcpuaffinity +isomalloc_sync
2) namd2 +idlepoll +p4 +devices 0,1 <configfile>

But getting the same error

Charm++> Running on 1 hosts (2 sockets x 10 cores x 2 PUs = 40-way SMP)
Charm++> cpu topology info is gathered in 0.006 seconds.
Info: Built with CUDA version 10010
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
 on Pe 0 (surya): unknown error
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
 on Pe 0 (surya): unknown error

Thanks,

Surya

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST