understanding which is which with GPU cards

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Apr 06 2013 - 17:17:13 CDT

Hello:
Could you please help understanding which is which with GPU cards on a
Linux box with six processors,
running namd2.9?

MD simulation allocated (from log file) to the two GPU cards:
Pe 1 physical rank 1 will use CUDA device of pe 2
Pe 4 physical rank 4 binding to CUDA device 1 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Pe 3 physical rank 3 will use CUDA device of pe 4
Pe 5 physical rank 5 will use CUDA device of pe 4
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 will use CUDA device of pe 2
(where gig64 is the machine name)

the card IDs being:
nvidia-smi -L
GPU0 UID 600f64d0-2996-8e71-dca8-8d66f139f772
GPU1 UID 704bb625-95a7-8779-cfdc-14a90e6581fc

under regular conditions:
nvidia-smi
driver v. 304.48
0 GTX 680 Bus-Id 0000:02:00.0 mem-usage 4% 89MB/2047MB temp 70C
1 GTX 680 Bus-Id 0000:03:00.0 mem-usage 5% 93MB/2047MB temp 69C

FATAL ERROR: CUDA error in cuda_check_remote_progress on Pe 2 (gig64 device
0):
unspecified launch failure.

To which card does the error refer?

Thanks for advice about this error, which appeared to have been fixed,
whereas it reappeared on next day

francesco pietra

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:06 CST