NAMD 2.11 on CRAY XK with CPU+GPU

From: P.-L. Chau (pc104_at_pasteur.fr)
Date: Wed Nov 30 2016 - 17:47:07 CST

I would like to ask about using CPUs and GPUs on a CRAY XK7 to run NAMD
2.11.

I use the pre-installed NAMD 2.11 from the supercomputing centre, which is
already GPU-enabled. I have checked with the help desk and they assure me
that my job submission script would automatically activate each GPU linked
to each CPU.

I have requested 10 nodes, each with 16 cores and each node is also linked
to a GPU. The NAMD output comes out with this:

CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> cpuaffinity PE-core map : 1-15
Charm++> set comm 0 on node 0 to core #0
Charm++> Running on 10 unique compute nodes (16-way SMP).
Info: Built with CUDA version 7000
Pe 24 physical rank 9 will use CUDA device of pe 15
Pe 29 physical rank 14 will use CUDA device of pe 15
Pe 3 physical rank 3 will use CUDA device of pe 8
Pe 5 physical rank 5 will use CUDA device of pe 8
Pe 6 physical rank 6 will use CUDA device of pe 8
Pe 14 physical rank 14 will use CUDA device of pe 8
Pe 13 physical rank 13 will use CUDA device of pe 8
Pe 1 physical rank 1 will use CUDA device of pe 8
Pe 2 physical rank 2 will use CUDA device of pe 8
Pe 4 physical rank 4 will use CUDA device of pe 8
Pe 0 physical rank 0 will use CUDA device of pe 8
Pe 12 physical rank 12 will use CUDA device of pe 8

and then subsequently this:

Pe 11 physical rank 11 will use CUDA device of pe 8
Pe 7 physical rank 7 will use CUDA device of pe 8
Pe 10 physical rank 10 will use CUDA device of pe 8
Pe 9 physical rank 9 will use CUDA device of pe 8
Pe 8 physical rank 8 binding to CUDA device 0 on physical node 0: 'Tesla
K20X' Mem: 5759MB Rev: 3.5
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
Pe 27 physical rank 12 will use CUDA device of pe 15
Pe 26 physical rank 11 will use CUDA device of pe 15
Pe 28 physical rank 13 will use CUDA device of pe 15
Pe 25 physical rank 10 will use CUDA device of pe 15
Pe 23 physical rank 8 will use CUDA device of pe 15
Pe 22 physical rank 7 will use CUDA device of pe 15
Pe 21 physical rank 6 will use CUDA device of pe 15
Pe 20 physical rank 5 will use CUDA device of pe 15
Pe 19 physical rank 4 will use CUDA device of pe 15
Pe 18 physical rank 3 will use CUDA device of pe 15
Pe 16 physical rank 1 will use CUDA device of pe 15
Pe 17 physical rank 2 will use CUDA device of pe 15
Pe 15 physical rank 0 binding to CUDA device 0 on physical node 1: 'Tesla
K20X' Mem: 5759MB Rev: 3.5

Does this mean that only 2 GPUs were engaged for this 10-node job? If so,
how do I recruit the other 8 GPUs? If not, then how come the output log
only mentions two GPUs?

Thank you very much.

P-L Chau

Institut Pasteur
Paris, France

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:20:50 CST