Re: Running 2.12

From: Vermaas, Joshua (Joshua.Vermaas_at_nrel.gov)
Date: Mon Oct 16 2017 - 12:07:07 CDT

Update your driver. I'm a little foggy on the specifics, but the NAMD error message is pretty clear. You are only a few minor versions behind the minimum for 8.0 (367.27<367.48, see https://github.com/NVIDIA/nvidia-docker/wiki/CUDA), so perhaps the code examples are mis-detecting what is actually available.

-Josh

On 10/13/2017 07:04 PM, Dr. Eddie wrote:
Hi all,
I'm pretty sure I have cuda 8 for the gtx 1080's but namd nightly build from a few days ago won't run:
[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ls
announce.txt flipbinpdb lib libcufft.so.8.0 namd2 psfgen sortreplicas
charmrun flipdcd libcudart.so.8.0 license.txt notes.txt README.txt
[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ldd namd2
        linux-vdso.so.1 => (0x00007ffc5477a000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f3fda639000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f3fda434000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00007f3fda22c000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007f3fd9f2a000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f3fd9c21000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f3fd9a0b000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007f3fd9649000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3fda870000)

[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ./namd2
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 1 threads
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID: v6.8.0-57-g5705b64-namd-charm-6.8.0-build-2017-Sep-12-21260
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (56-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: Built with CUDA version 8000
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file src/DeviceCUDA.C, function initialize
 on Pe 0 (node1.cl.siue.edu<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fnode1.cl.siue.edu%2F&data=02%7C01%7CJoshua.Vermaas%40nrel.gov%7C930cb11ddc8c450de5ff08d5129f8c14%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C636435398826018908&sdata=BCO9EBntGf1dNNMtUGtLS1NCpP%2BWZb5R1%2F679kIGsN0%3D&reserved=0>): CUDA driver version is insufficient for CUDA runtime version

But everything below says it is cuda 8... or am I seeing that wrong?

[eackad_at_node1 ~]$ nvidia-smi
Fri Oct 13 19:38:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 0000:02:00.0 Off | N/A |
| 27% 28C P8 6W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 On | 0000:03:00.0 Off | N/A |
| 27% 31C P8 6W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 1080 On | 0000:82:00.0 Off | N/A |
| 27% 30C P8 6W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 1080 On | 0000:83:00.0 Off | N/A |
| 27% 28C P8 6W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

[eackad_at_node1 ~]$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840<tel:%28850%29%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647<tel:%28214%29%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647<tel:%28214%29%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840<tel:%28850%29%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647<tel:%28214%29%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647<tel:%28214%29%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840<tel:%28850%29%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647<tel:%28214%29%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647<tel:%28214%29%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 130 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 3: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840<tel:%28850%29%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647<tel:%28214%29%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647<tel:%28214%29%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU1) : Yes
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU2) : No
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU3) : No
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU0) : Yes
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU2) : No
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU3) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU0) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU1) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU3) : Yes
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU0) : No
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU1) : No
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 4, Device0 = GeForce GTX 1080, Device1 = GeForce GTX 1080, Device2 = GeForce GTX 1080, Device3 = GeForce GTX 1080
Result = PASS

Am I missing something?
Thanks for the help in advance!

--
Eddie

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:43 CST