Running 2.12

From: Dr. Eddie (eackad_at_gmail.com)
Date: Fri Oct 13 2017 - 19:56:46 CDT

Hi all,
I'm pretty sure I have cuda 8 for the gtx 1080's but namd nightly build
from a few days ago won't run:
[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ls
announce.txt flipbinpdb lib libcufft.so.8.0 namd2
psfgen sortreplicas
charmrun flipdcd libcudart.so.8.0 license.txt notes.txt
README.txt
[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ldd namd2
        linux-vdso.so.1 => (0x00007ffc5477a000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f3fda639000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f3fda434000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00007f3fda22c000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007f3fd9f2a000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f3fd9c21000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f3fd9a0b000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007f3fd9649000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3fda870000)

[eackad_at_node1 NAMD_Git-2017-10-11_Linux-x86_64-multicore-CUDA]$ ./namd2
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 1 threads
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID: v6.8.0-57-g5705b64-namd-charm-
6.8.0-build-2017-Sep-12-21260
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel,
thread migration may not work! Run 'echo 0 >
/proc/sys/kernel/randomize_va_space'
as root to disable it, or try running with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (56-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: Built with CUDA version 8000
FATAL ERROR: CUDA error cudaGetDeviceCount(&deviceCount) in file
src/DeviceCUDA.C, function initialize
 on Pe 0 (node1.cl.siue.edu): CUDA driver version is insufficient for CUDA
runtime version

But everything below says it is cuda 8... or am I seeing that wrong?

[eackad_at_node1 ~]$ nvidia-smi
Fri Oct 13 19:38:01 2017
+-----------------------------------------------------------
------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27
  |
|-------------------------------+----------------------+----
------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+====
==================|
| 0 GeForce GTX 1080 On | 0000:02:00.0 Off |
N/A |
| 27% 28C P8 6W / 180W | 0MiB / 8113MiB | 0%
Default |
+-------------------------------+----------------------+----
------------------+
| 1 GeForce GTX 1080 On | 0000:03:00.0 Off |
N/A |
| 27% 31C P8 6W / 180W | 0MiB / 8113MiB | 0%
Default |
+-------------------------------+----------------------+----
------------------+
| 2 GeForce GTX 1080 On | 0000:82:00.0 Off |
N/A |
| 27% 30C P8 6W / 180W | 0MiB / 8113MiB | 0%
Default |
+-------------------------------+----------------------+----
------------------+
| 3 GeForce GTX 1080 On | 0000:83:00.0 Off |
N/A |
| 27% 28C P8 6W / 180W | 0MiB / 8113MiB | 0%
Default |
+-------------------------------+----------------------+----
------------------+

+-----------------------------------------------------------
------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
  |
|===========================================================
==================|
| No running processes found
   |
+-----------------------------------------------------------
------------------+

[eackad_at_node1 ~]$ /usr/local/cuda/samples/bin/x86_64/linux/release/
deviceQuery
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840
<(850)%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048
layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647
<(214)%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647
<(214)%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

Device 1: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840
<(850)%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048
layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647
<(214)%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647
<(214)%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

Device 2: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840
<(850)%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048
layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647
<(214)%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647
<(214)%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 130 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

Device 3: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version 8.0 / 8.0
  CUDA Capability Major/Minor version number: 6.1
  Total amount of global memory: 8113 MBytes (8507555840
<(850)%20755-5840> bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
  GPU Max Clock rate: 1734 MHz (1.73 GHz)
  Memory Clock rate: 5005 Mhz
  Memory Bus Width: 256-bit
  L2 Cache Size: 2097152 bytes
  Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048
layers
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size: 32
  Maximum number of threads per multiprocessor: 2048
  Maximum number of threads per block: 1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z): (2147483647
<(214)%20748-3647>, 65535, 65535)
  Maximum memory pitch: 2147483647
<(214)%20748-3647> bytes
  Texture alignment: 512 bytes
  Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Alignment requirement for Surfaces: Yes
  Device has ECC support: Disabled
  Device supports Unified Addressing (UVA): Yes
  Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU1) : Yes
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU2) : No
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 1080 (GPU3) : No
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU0) : Yes
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU2) : No
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 (GPU3) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU0) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU1) : No
> Peer access from GeForce GTX 1080 (GPU2) -> GeForce GTX 1080 (GPU3) : Yes
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU0) : No
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU1) : No
> Peer access from GeForce GTX 1080 (GPU3) -> GeForce GTX 1080 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime
Version = 8.0, NumDevs = 4, Device0 = GeForce GTX 1080, Device1 = GeForce
GTX 1080, Device2 = GeForce GTX 1080, Device3 = GeForce GTX 1080
Result = PASS

Am I missing something?
Thanks for the help in advance!

-- 
Eddie

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:43 CST