How to run on multi-node environment

From: Luis Cebamanos (luiceur_at_gmail.com)
Date: Wed Dec 22 2021 - 08:39:58 CST

Hello all,

Trying to run on a multinode/multi-GPU environment (namd built with
Charm-verbs, cuda SMP and Intel). Each node with 4 GPUs, 40 CPUs:

charmrun ++nodelist nodeListFiletxt ++p 72 ++ppn 9 namd2 +devices
0,1,2,3 +isomalloc_sync +setcpuaffinity +idlepoll +pemap
1-9,11-19,21-29,31-39 +comm
ap 0,10,20,30 stmv.namd

where my nodeListFile.txt looks like:

group main
host andraton11 host andraton12 ++cpus 40 ++shell ssh

I am getting the following error:

FATAL ERROR: Number of devices (4) is not a multiple of number of
processes (8).  Sharing devices between processe
s is inefficient.  Specify +ignoresharing (each process uses all visible
devices) if not all devices are visible t
o each process, otherwise adjust number of processes to evenly divide
number of devices, specify subset of devices
  with +devices argument (e.g., +devices 0,2), or multiply list shared
devices (e.g., +devices 0,1,2,0).

If not using +ignoresharing, how should I run this correctly?

Regards,

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:12 CST