From: Gerald Keller (gerald.keller_at_uni-wuerzburg.de)
Date: Mon May 25 2020 - 14:17:40 CDT
Hi Michael,
thanks four your reply! I think this really pushed me further.
I used your posted function to generate the nodelist and started namd like you mentioned for ibverb-smp.
Now Charmrun seems not to be able to connect to the node:
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 1 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 2 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 3 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Too many reconnection attempts; bailing out
Is there anything I could do at this point, or do I have to contact the HPC administrators to find a solution?
Best,
Gerald
>>> "Renfro, Michael" <Renfro_at_tntech.edu> 25.05.20 16.40 Uhr >>>
Not a NAMD expert, but I've had it installed in our HPC since 2017. Here’s what I’d suggest, and someone else may come
along to correct me:
- for runs on a single node, we use the the multicore build, which is probably what you were using on your single nodes
previously.
- for runs on multiple nodes with Infiniband, we use the ibverbs-smp build.
The multicore builds are pretty straightforward to run, the ibverbs-smp ones much less so. I ended up writing some bash
functions to make things easier for the end users [1]. Those helper functions end up calling namd as follows:
- Multicore build for single node runs: namd2 +p${SLURM_NTASKS} inputfile
- Ibverbs-smp build for multi-node runs: charmrun `which namd2` ++p ${SLURM_NTASKS} ++ppn ${SLURM_CPUS_ON_NODE}
++nodelist nodelist inputfile
where nodelist was generated from the contents of ${SLURM_NODELIST}. I got pretty good scaling on a 3M atom system up to
nearly 40 nodes.
-- Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services 931 372-3601 / Tennessee Tech University > On May 25, 2020, at 3:56 AM, Gerald Keller <gerald.keller_at_uni-wuerzburg.de> wrote: > > Hi all, > > I'm usually running NAMD on local single nodes in our working group. Since we are at the limit of our resources I'd like to run NAMD simulations on a HPC. > On the desired HPC, NAMD is not preinstalled / can't be loaded as a module, SLURM and infiniband are available. I'm not using GPUs. > > Now I'm not really sure how to get started. > I downloaded the precompiled version of ibverbs and ibverbs-smp of NAMD 2.13, respectively and tried to do a test run. > > Then I only observe errors: > > Assertion "num_devices > 0" failed in file machine-ibverbs.c line 482 and that the tasks have segmentation fault. > I have run the test simulation with: > > srun -N 1 -n 3 -p partition_name namd2 conffile > > Could anybody point me into the right direction? Do I have to compile NAMD for the HPC on my own? Am I running completely wrong parameters regarding to srun? > > Best regards, > Gerald
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST