GPU Selection in NAMD CUDA

From: Ron Stubbs (rons_at_duke.edu)
Date: Mon Sep 21 2009 - 14:56:42 CDT

Hi All,

I have a Tesla C-1060 installed on my workstation along with a Quadro FX
570 video card.

My problem is that NAMD-CUDA is using the FX 570 instead of the Tesla
card. Is there a way to pass an argument to NAMD-CUDA to select the
desired device? If not guess, I'll need to download the source and
modify it to run device 0.

I've swapped device slots, but the Tesla still enumerated as device 0
and the video card as device 1. NAMD-CUDA appears to look for a device
at ID:1

Here's the relevant excerpt for the my output file:

Info: 1 NAMD CVS Linux-x86_64-CUDA 1 ocracoke.pratt.duke.edu rons
Info: Running on 1 processors.
Info: Charm++/Converse parallel runtime startup completed at 0.00278807 s
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 binding to CUDA device 1 on
ocracoke.pratt.duke.edu: 'Quadro FX 570' Mem: 255MB Rev: 1.1
Info: 1.62163 MB of memory in use based on CmiMemoryUsage

Here's the results of running deviceQuery:

rons_at_ocracoke:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release> ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA

Device 0: "Tesla C1060"
  CUDA Driver Version: 2.30
  CUDA Runtime Version: 2.30
  CUDA Capability Major revision number: 1
  CUDA Capability Minor revision number: 3
  Total amount of global memory: 4294705152 bytes
  Number of multiprocessors: 30
  Number of cores: 240
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 262144 bytes
  Texture alignment: 256 bytes
  Clock rate: 1.30 GHz
  Concurrent copy and execution: Yes
  Run time limit on kernels: No
  Integrated: No
  Support host page-locked memory mapping: Yes
  Compute mode: Default (multiple host
threads can use this device simultaneously)

Device 1: "Quadro FX 570"
  CUDA Driver Version: 2.30
  CUDA Runtime Version: 2.30
  CUDA Capability Major revision number: 1
  CUDA Capability Minor revision number: 1
  Total amount of global memory: 268107776 bytes
  Number of multiprocessors: 2
  Number of cores: 16
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 8192
  Warp size: 32
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 262144 bytes
  Texture alignment: 256 bytes
  Clock rate: 0.92 GHz
  Concurrent copy and execution: Yes
  Run time limit on kernels: Yes
  Integrated: No
  Support host page-locked memory mapping: No
  Compute mode: Default (multiple host
threads can use this device simultaneously)

Test PASSED

Any comments/suggestions would be greatly appreciated.

Thanks,
Ron

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ron Stubbs
Senior Systems Programmer
Research Computing
Pratt School of Engineering
1454A Fitzpatrick Center         Box 90271
Duke University,        Durham, N.C. 27708-0271
office: (919)660-5339   cell:(919)641-5689
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:51:29 CST