Re: Compiling namd2 in 2 GPU 2 CPU workstation

From: Suryanarayanan Chandrasekaran (suryanar_at_ualberta.ca)
Date: Sun Mar 21 2021 - 23:24:45 CDT

I don't think this is a driver issue, Given below is the output from
nvidia-smi. In the installation manual, it is mentioned only one GPU can be
used in Multicore Cuda in a single node. Probably mine is a workstation, so
the problem might be identifying 2 GPU. I want to use both the two CPUs and
two GPUs, in the workstation.

$ nvidia-smi
Mon Mar 22 09:49:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1
  |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
| | | MIG
M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off |
 Off |
| 33% 37C P8 18W / 230W | 1MiB / 16125MiB | 0%
 Default |
| | |
 N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 5000 Off | 00000000:73:00.0 On |
 Off |
| 33% 40C P8 18W / 230W | 652MiB / 16117MiB | 0%
 Default |
| | |
 N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:
   |
| GPU GI CI PID Type Process name GPU
Memory |
| ID ID Usage
   |
|=============================================================================|
| 0 N/A N/A 2584 G /usr/lib/xorg/Xorg
 0MiB |
| 0 N/A N/A 2911 G /usr/bin/gnome-shell
 0MiB |
| 0 N/A N/A 8657 G ...gAAAAAAAAA --shared-files
 0MiB |
| 0 N/A N/A 12099 G /usr/lib/firefox/firefox
 0MiB |
| 0 N/A N/A 12615 G /usr/lib/firefox/firefox
 0MiB |
| 0 N/A N/A 13032 G /usr/lib/firefox/firefox
 0MiB |
| 0 N/A N/A 13527 G /usr/lib/firefox/firefox
 0MiB |
| 0 N/A N/A 25049 G /usr/lib/firefox/firefox
 0MiB |
| 1 N/A N/A 2584 G /usr/lib/xorg/Xorg
 484MiB |
| 1 N/A N/A 2911 G /usr/bin/gnome-shell
86MiB |
| 1 N/A N/A 8657 G ...gAAAAAAAAA --shared-files
19MiB |
| 1 N/A N/A 12099 G /usr/lib/firefox/firefox
 2MiB |
| 1 N/A N/A 12615 G /usr/lib/firefox/firefox
 2MiB |
| 1 N/A N/A 13032 G /usr/lib/firefox/firefox
 2MiB |
| 1 N/A N/A 13527 G /usr/lib/firefox/firefox
 2MiB |
| 1 N/A N/A 25049 G /usr/lib/firefox/firefox
44MiB |
+-----------------------------------------------------------------------------+

On Sun, Mar 21, 2021 at 4:42 AM Vermaas, Josh <vermaasj_at_msu.edu> wrote:

> What is the output of nvidia-smi? NAMD isn’t able to ask CUDA how many
> GPUs there are. Sometimes this is a driver issue, and sometimes this is
> just a compatibility issue between your hardware and what NAMD expects.
>
>
>
> -Josh
>
>
>
> *From: *<owner-namd-l_at_ks.uiuc.edu> on behalf of Suryanarayanan
> Chandrasekaran <suryanar_at_ualberta.ca>
> *Reply-To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, Suryanarayanan
> Chandrasekaran <suryanar_at_ualberta.ca>
> *Date: *Saturday, March 20, 2021 at 6:09 AM
> *To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>
> *Subject: *namd-l: Compiling namd2 in 2 GPU 2 CPU workstation
>
>
>
> Hello,
>
> I have a workstation with 2 CPU and 2 GPU. I downloaded
> NAMD_Git-2021-03-03_Linux-x86_64-multicore-CUDA and tried to compile it
> with the below options, I want to use all GPU for a single CPU or both the
> CPU and GPU in the workstation.
>
>
>
> 1) namd2 +p18 +idlepoll stmv.namd +setcpuaffinity +isomalloc_sync
>
> 2) namd2 +idlepoll +p4 +devices 0,1 <configfile>
>
> *But getting the same error*
>
> Charm++> Running on 1 hosts (2 sockets x 10 cores x 2 PUs = 40-way SMP)
> Charm++> cpu topology info is gathered in 0.006 seconds.
> Info: Built with CUDA version 10010
> FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
> on Pe 0 (surya): unknown error
> FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize, line 135
> on Pe 0 (surya): unknown error
>
> Thanks,
>
> Surya
>
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST