From: Simon Dürr (simon.duerr_at_uni-konstanz.de)
Date: Wed Sep 02 2015 - 09:42:03 CDT

Hi all,

I'm trying to run an ILS calculation on a GPU Cluster.
The stats of the system I'm using:
- Scientific Linux 6.1
- 24gb RAM
- 8 CPUs
- 7 GPUs (NVidia GeForce GTX 580 with 1.5gb RAM)

I use VMD 1.9.1 (OpenGL/CUDA enabled) and CUDA Driver v6.5

My system has 60.000 atoms and is a 10ns equilibration from NAMD 2.9.
The dcd contains a frame for each ps.

When I try to run ILS with oxygen and a subres greater than 1 I cannot
use CUDA for acceleration of the computing (".....max_binoffsets
exceeded, using CPU....").
Also it seems I'm using only one GPU not all 7 available ones. VMD
detects them all and when i set the subres to 1 the calculation uses
CUDA but only on GPU [0] (frametime ~20sec).

My Questions:
Is it possible to use all GPUs for the calculation?
Is it possible that the memory of the GPUs is not sufficient to
accelerate this calculation with CUDA when subres >= 2 ?
Is there any way to circumvent this?

See parts of the log below

Cheers,
Simon

Info) Multithreading available, 8 CPUs detected.
Info) Free system memory: 21775MB (90%)
Info) Creating CUDA device pool and initializing hardware...
Info) Detected 7 available CUDA accelerators:
Info) [0] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [1] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [2] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [3] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [4] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [5] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
Info) [6] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP

Info) ILS frame 1/10000
Info) Coord setup: 0.088089 s
Info) Aligning frames.
Info) ComputeOccupancyMap_setup() 0.016058 s
Using CUDA device: 0
***** ERROR: Exceeded MAX_BINOFFSETS for CUDA kernel
Info) vmd_cuda_evaluate_occupancy_map() FAILED, using CPU for calculation
Info) ComputeOccupancyMap: find_distance_exclusions() 0.232957 s
Info) ComputeOccupancyMap: find_energy_exclusions()
329.343837 s -> 6690550 exclusions
Info) ComputeOccupancyMap: compute_occupancy_multiatom() 303.300753 s
Info) ComputeOccupancyMap_calculate_slab() 632.882040 s
Info) Total frame time = 632.988717 s