AW: NAMD 2.11 on CRAY XK with CPU+GPU

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Dec 01 2016 - 01:04:38 CST

Sometimes the output floods too fast, so the other GPU detection lines where
just overwritten in the output buffer. Generally, namd cannot run on GPUs if
not all nodes have GPUs. So its most likely that it just works as expected.

Norman Geist

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von P.-L. Chau
> Gesendet: Donnerstag, 1. Dezember 2016 00:47
> An: namd-l_at_ks.uiuc.edu
> Betreff: namd-l: NAMD 2.11 on CRAY XK with CPU+GPU
>
> I would like to ask about using CPUs and GPUs on a CRAY XK7 to run NAMD
> 2.11.
>
> I use the pre-installed NAMD 2.11 from the supercomputing centre, which is
> already GPU-enabled. I have checked with the help desk and they assure me
> that my job submission script would automatically activate each GPU linked
> to each CPU.
>
> I have requested 10 nodes, each with 16 cores and each node is also linked
> to a GPU. The NAMD output comes out with this:
>
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> cpu affinity enabled.
> Charm++> cpuaffinity PE-core map : 1-15
> Charm++> set comm 0 on node 0 to core #0
> Charm++> Running on 10 unique compute nodes (16-way SMP).
> Info: Built with CUDA version 7000
> Pe 24 physical rank 9 will use CUDA device of pe 15
> Pe 29 physical rank 14 will use CUDA device of pe 15
> Pe 3 physical rank 3 will use CUDA device of pe 8
> Pe 5 physical rank 5 will use CUDA device of pe 8
> Pe 6 physical rank 6 will use CUDA device of pe 8
> Pe 14 physical rank 14 will use CUDA device of pe 8
> Pe 13 physical rank 13 will use CUDA device of pe 8
> Pe 1 physical rank 1 will use CUDA device of pe 8
> Pe 2 physical rank 2 will use CUDA device of pe 8
> Pe 4 physical rank 4 will use CUDA device of pe 8
> Pe 0 physical rank 0 will use CUDA device of pe 8
> Pe 12 physical rank 12 will use CUDA device of pe 8
>
> and then subsequently this:
>
> Pe 11 physical rank 11 will use CUDA device of pe 8
> Pe 7 physical rank 7 will use CUDA device of pe 8
> Pe 10 physical rank 10 will use CUDA device of pe 8
> Pe 9 physical rank 9 will use CUDA device of pe 8
> Pe 8 physical rank 8 binding to CUDA device 0 on physical node 0: 'Tesla
> K20X' Mem: 5759MB Rev: 3.5
> CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
> notification) but not using node-level queue
> Pe 27 physical rank 12 will use CUDA device of pe 15
> Pe 26 physical rank 11 will use CUDA device of pe 15
> Pe 28 physical rank 13 will use CUDA device of pe 15
> Pe 25 physical rank 10 will use CUDA device of pe 15
> Pe 23 physical rank 8 will use CUDA device of pe 15
> Pe 22 physical rank 7 will use CUDA device of pe 15
> Pe 21 physical rank 6 will use CUDA device of pe 15
> Pe 20 physical rank 5 will use CUDA device of pe 15
> Pe 19 physical rank 4 will use CUDA device of pe 15
> Pe 18 physical rank 3 will use CUDA device of pe 15
> Pe 16 physical rank 1 will use CUDA device of pe 15
> Pe 17 physical rank 2 will use CUDA device of pe 15
> Pe 15 physical rank 0 binding to CUDA device 0 on physical node 1: 'Tesla
> K20X' Mem: 5759MB Rev: 3.5
>
> Does this mean that only 2 GPUs were engaged for this 10-node job? If so,
> how do I recruit the other 8 GPUs? If not, then how come the output log
> only mentions two GPUs?
>
> Thank you very much.
>
> P-L Chau
>
> Institut Pasteur
> Paris, France

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:22:40 CST