Re: Multi node run causes "CUDA error cudaStreamCreate"

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Apr 01 2011 - 16:10:00 CDT

Next message: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Previous message: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
In reply to: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Next in thread: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Reply: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On Fri, Apr 1, 2011 at 4:46 PM, Michael S. Sellers (Cont, ARL/WMRD)
<michael.s.sellers.ctr_at_us.army.mil> wrote:
> Axel,
>
> Thanks for the help. See below for the output of your suggestion.
>
> The following, from a multi node NAMD startup does not seem right:
>
> Pe 9 physical rank 1 binding to CUDA device 1 on n0: 'Tesla T10 Processor'
> Mem: 4095MB Rev: 1.3
> Pe 7 physical rank 3 binding to CUDA device 1 on n1: 'Tesla T10 Processor'
> Mem: 4095MB Rev: 1.3
>
> Should Pe 9 bind to 'CUDA device 0 on n2' ? Where the pool is node{0-2},
> Pe{0-11}, CUDA device{0,1}/node.

no. the sharing seems right.
with two devices per node, all even PEs should get device 0
and the odd PEs get device 1.

ecc also is not an issue, since you have a G200 device without ECC support.

the next possibility is that the GPUs are configured for
"compute-exclusive" mode, i.e. only one process at a
time can use a GPU. this is done by some admins
when the batch system allows multiple jobs to enter
a node to reduce the risk of having two jobs accessing
the same GPUs, while others are idle.

cheers,
axel.

>
> -Mike
>
> ________________________________________________________________________
>
> Output of 'nvidia-smi -r' for several nodes:
>
> ECC is not supported by GPU 0
> ECC is not supported by GPU 1
>
>
> Output of 'nvidia-smi -a':
>
> ==============NVSMI LOG==============
>
>
> Timestamp :
> Unit 0:
> Product Name : NVIDIA Tesla S1070 -500
> Product ID :
> Serial Number :
> Firmware Ver : 3.6
> Intake Temperature : 15 C
> GPU 0:
> Product Name : Tesla T10 Processor
> Serial : Not available
> PCI ID : 5e710de
> Bridge Port : 0
> Temperature : 31 C
> GPU 1:
> Product Name : Tesla T10 Processor
> Serial : Not available
> PCI ID : 5e710de
> Bridge Port : 2
> Temperature : 29 C
> Fan Tachs:
> #00: 3636 Status: NORMAL
> #01: 3462 Status: NORMAL
> #02: 3664 Status: NORMAL
> #03: 3376 Status: NORMAL
> #04: 3598 Status: NORMAL
> #05: 3582 Status: NORMAL
> #06: 3688 Status: NORMAL
> #07: 3474 Status: NORMAL
> #08: 3664 Status: NORMAL
> #09: 3488 Status: NORMAL
> #10: 3658 Status: NORMAL
> #11: 3412 Status: NORMAL
> #12: 3682 Status: NORMAL
> #13: 3578 Status: NORMAL
> PSU:
> Voltage : 11.99 V
> Current : 15.64 A
> State : Normal
> LED:
> State : GREEN
>
>
>
>
>
>
> Axel Kohlmeyer wrote:
>>
>> i just had this kind of error myself.
>>
>> check your GPUs with: nvida-smi -a
>> could be that one of them has ECC errors and then NAMD
>> (rightfully so) refuses to use the device.
>>
>> axel
>>
>> On Fri, Apr 1, 2011 at 1:32 PM, Michael S. Sellers (Cont, ARL/WMRD)
>> <michael.s.sellers.ctr_at_us.army.mil> wrote:
>>
>>>
>>> All,
>>>
>>> I am receiving a "FATAL ERROR: CUDA error cudaStreamCreate on Pe 7 (n1
>>> device 1): no CUDA-capable device is available" when NAMD starts up and
>>> is
>>> optimizing FFT steps, for a job running on 3 nodes, 4ppn, 2 Tesla's per
>>> node.
>>>
>>> The command I'm executing within a PBS script is:
>>> ~/software/bin/charmrun +p12 ~/software/bin/namd2 +idlepoll sim1.conf >
>>> $PBS_JOBNAME.out
>>>
>>> NAMD CUDA does not give this error on 1 node, 8ppn, 2 Teslas. Please see
>>> output below.
>>>
>>> Might this be a situation where I need to use the +devices flag? It
>>> seems
>>> as though the PEs are binding to CUDA devices on other nodes.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>>
>>> Charm++> Running on 3 unique compute nodes (8-way SMP).
>>> Charm++> cpu topology info is gathered in 0.203 seconds.
>>> Info: NAMD CVS-2011-03-22 for Linux-x86_64-MPI-CUDA
>>> Info:
>>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>>> Info: for updates, documentation, and support information.
>>> Info:
>>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>>> Info: in all publications reporting results obtained with NAMD.
>>> Info:
>>> Info: Based on Charm++/Converse 60303 for mpi-linux-x86_64
>>> Info: 1 NAMD CVS-2011-03-22 Linux-x86_64-MPI-CUDA
>>> Info: Running on 12 processors, 12 nodes, 3 physical nodes.
>>> Info: CPU topology information available.
>>> Info: Charm++/Converse parallel runtime startup completed at 0.204571 s
>>> Pe 2 sharing CUDA device 0 first 0 next 0
>>> Did not find +devices i,j,k,... argument, using all
>>> Pe 2 physical rank 2 binding to CUDA device 0 on n2: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 3 sharing CUDA device 1 first 1 next 1
>>> Pe 3 physical rank 3 binding to CUDA device 1 on n2: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 0 sharing CUDA device 0 first 0 next 2
>>> Pe 0 physical rank 0 binding to CUDA device 0 on n2: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 9 sharing CUDA device 1 first 9 next 11
>>> Pe 7 sharing CUDA device 1 first 5 next 5
>>> Pe 5 sharing CUDA device 1 first 5 next 7
>>> Pe 9 physical rank 1 binding to CUDA device 1 on n0: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 7 physical rank 3 binding to CUDA device 1 on n1: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 5 physical rank 1 binding to CUDA device 1 on n1: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 10 sharing CUDA device 0 first 8 next 8
>>> Pe 11 sharing CUDA device 1 first 9 next 9
>>> Pe 8 sharing CUDA device 0 first 8 next 10
>>> Pe 11 physical rank 3 binding to CUDA device 1 on n0: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 10 physical rank 2 binding to CUDA device 0 on n0: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 8 physical rank 0 binding to CUDA device 0 on n0: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 6 sharing CUDA device 0 first 4 next 4
>>> Pe 6 physical rank 2 binding to CUDA device 0 on n1: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 1 sharing CUDA device 1 first 1 next 3
>>> Pe 1 physical rank 1 binding to CUDA device 1 on n2: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Pe 4 sharing CUDA device 0 first 4 next 6
>>> Pe 4 physical rank 0 binding to CUDA device 0 on n1: 'Tesla T10
>>> Processor'
>>> Mem: 4095MB Rev: 1.3
>>> Info: 51.4492 MB of memory in use based on /proc/self/stat
>>> ...
>>> ...
>>> Info: PME MAXIMUM GRID SPACING 1.5
>>> Info: Attempting to read FFTW data from
>>> FFTW_NAMD_CVS-2011-03-22_Linux-x86_64-MPI-CUDA.txt
>>> Info: Optimizing 6 FFT steps. 1...FATAL ERROR: CUDA error
>>> cudaStreamCreate
>>> on Pe 7 (n1 device 1): no CUDA-capable device is available
>>> ------------- Processor 7 Exiting: Called CmiAbort ------------
>>> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 7 (n1 device 1):
>>> no
>>> CUDA-capable device is available
>>>
>>> [7] Stack Traceback:
>>> [7:0] CmiAbort+0x59 [0x907f64]
>>> [7:1] _Z8NAMD_diePKc+0x4a [0x4fa7ba]
>>> [7:2] _Z13cuda_errcheckPKc+0xdf [0x624b5f]
>>> [7:3] _Z15cuda_initializev+0x2a7 [0x624e27]
>>> [7:4] _Z11master_initiPPc+0x1a1 [0x500a11]
>>> [7:5] main+0x19 [0x4fd489]
>>> [7:6] __libc_start_main+0xf4 [0x32ca41d994]
>>> [7:7] cos+0x1d1 [0x4f9d99]
>>> FATAL ERROR: CUDA error cudaStreamCreate on Pe 9 (n0 device 1): no
>>> CUDA-capable device is available
>>> ------------- Processor 9 Exiting: Called CmiAbort ------------
>>> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 9 (n0 device 1):
>>> no
>>> CUDA-capable device is available
>>>
>>> [9] Stack Traceback:
>>> [9:0] CmiAbort+0x59 [0x907f64]
>>> [9:1] _Z8NAMD_diePKc+0x4a [0x4fa7ba]
>>> [9:2] _Z13cuda_errcheckPKc+0xdf [0x624b5f]
>>> [9:3] _Z15cuda_initializev+0x2a7 [0x624e27]
>>> [9:4] _Z11master_initiPPc+0x1a1 [0x500a11]
>>> [9:5] main+0x19 [0x4fd489]
>>> [9:6] __libc_start_main+0xf4 [0x32ca41d994]
>>> [9:7] cos+0x1d1 [0x4f9d99]
>>> FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (n1 device 1): no
>>> CUDA-capable device is available
>>> ------------- Processor 5 Exiting: Called CmiAbort ------------
>>> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (n1 device 1):
>>> no
>>> CUDA-capable device is available
>>> ..
>>> ..
>>> ..
>>>
>>>
>>>
>>
>>
>>
>>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

Next message: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Previous message: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
In reply to: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Next in thread: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Reply: Michael S. Sellers (Cont, ARL/WMRD): "Re: Multi node run causes "CUDA error cudaStreamCreate""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:54 CST