Re: AW: NAMD 2.11 on CRAY XK with CPU+GPU

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Tue Dec 06 2016 - 15:54:31 CST

The GPU binding information is only printed for the first two physical
nodes; the remaining nodes should be similar. Every process must find a
GPU or NAMD will exit. The only risk is on clusters that have multiple
GPUs per node that one GPU may be down, in which case "+devices 0,1" or
similar can be specified rather than simply binding to all GPUs found.

Jim

On Thu, 1 Dec 2016, Norman Geist wrote:

> Sometimes the output floods too fast, so the other GPU detection lines where
> just overwritten in the output buffer. Generally, namd cannot run on GPUs if
> not all nodes have GPUs. So its most likely that it just works as expected.
>
> Norman Geist
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von P.-L. Chau
>> Gesendet: Donnerstag, 1. Dezember 2016 00:47
>> An: namd-l_at_ks.uiuc.edu
>> Betreff: namd-l: NAMD 2.11 on CRAY XK with CPU+GPU
>>
>> I would like to ask about using CPUs and GPUs on a CRAY XK7 to run NAMD
>> 2.11.
>>
>> I use the pre-installed NAMD 2.11 from the supercomputing centre, which is
>> already GPU-enabled. I have checked with the help desk and they assure me
>> that my job submission script would automatically activate each GPU linked
>> to each CPU.
>>
>> I have requested 10 nodes, each with 16 cores and each node is also linked
>> to a GPU. The NAMD output comes out with this:
>>
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> cpu affinity enabled.
>> Charm++> cpuaffinity PE-core map : 1-15
>> Charm++> set comm 0 on node 0 to core #0
>> Charm++> Running on 10 unique compute nodes (16-way SMP).
>> Info: Built with CUDA version 7000
>> Pe 24 physical rank 9 will use CUDA device of pe 15
>> Pe 29 physical rank 14 will use CUDA device of pe 15
>> Pe 3 physical rank 3 will use CUDA device of pe 8
>> Pe 5 physical rank 5 will use CUDA device of pe 8
>> Pe 6 physical rank 6 will use CUDA device of pe 8
>> Pe 14 physical rank 14 will use CUDA device of pe 8
>> Pe 13 physical rank 13 will use CUDA device of pe 8
>> Pe 1 physical rank 1 will use CUDA device of pe 8
>> Pe 2 physical rank 2 will use CUDA device of pe 8
>> Pe 4 physical rank 4 will use CUDA device of pe 8
>> Pe 0 physical rank 0 will use CUDA device of pe 8
>> Pe 12 physical rank 12 will use CUDA device of pe 8
>>
>> and then subsequently this:
>>
>> Pe 11 physical rank 11 will use CUDA device of pe 8
>> Pe 7 physical rank 7 will use CUDA device of pe 8
>> Pe 10 physical rank 10 will use CUDA device of pe 8
>> Pe 9 physical rank 9 will use CUDA device of pe 8
>> Pe 8 physical rank 8 binding to CUDA device 0 on physical node 0: 'Tesla
>> K20X' Mem: 5759MB Rev: 3.5
>> CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
>> notification) but not using node-level queue
>> Pe 27 physical rank 12 will use CUDA device of pe 15
>> Pe 26 physical rank 11 will use CUDA device of pe 15
>> Pe 28 physical rank 13 will use CUDA device of pe 15
>> Pe 25 physical rank 10 will use CUDA device of pe 15
>> Pe 23 physical rank 8 will use CUDA device of pe 15
>> Pe 22 physical rank 7 will use CUDA device of pe 15
>> Pe 21 physical rank 6 will use CUDA device of pe 15
>> Pe 20 physical rank 5 will use CUDA device of pe 15
>> Pe 19 physical rank 4 will use CUDA device of pe 15
>> Pe 18 physical rank 3 will use CUDA device of pe 15
>> Pe 16 physical rank 1 will use CUDA device of pe 15
>> Pe 17 physical rank 2 will use CUDA device of pe 15
>> Pe 15 physical rank 0 binding to CUDA device 0 on physical node 1: 'Tesla
>> K20X' Mem: 5759MB Rev: 3.5
>>
>> Does this mean that only 2 GPUs were engaged for this 10-node job? If so,
>> how do I recruit the other 8 GPUs? If not, then how come the output log
>> only mentions two GPUs?
>>
>> Thank you very much.
>>
>> P-L Chau
>>
>> Institut Pasteur
>> Paris, France
>
>
>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:20:51 CST