Re: Re: Setting up / Running NAMD on HPC

From: Gerald Keller (gerald.keller_at_uni-wuerzburg.de)
Date: Mon May 25 2020 - 14:17:40 CDT

Hi Michael,

thanks four your reply! I think this really pushed me further.

I used your posted function to generate the nodelist and started namd like you mentioned for ibverb-smp.

Now Charmrun seems not to be able to connect to the node:

Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 1 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 2 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Reconnection attempt 3 of 3
Permission denied (publickey,password).
Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
Charmrun> Too many reconnection attempts; bailing out

Is there anything I could do at this point, or do I have to contact the HPC administrators to find a solution?

Best,
Gerald

>>> "Renfro, Michael" <Renfro_at_tntech.edu> 25.05.20 16.40 Uhr >>>
Not a NAMD expert, but I've had it installed in our HPC since 2017. Here’s what I’d suggest, and someone else may come
along to correct me:

- for runs on a single node, we use the the multicore build, which is probably what you were using on your single nodes
previously.
- for runs on multiple nodes with Infiniband, we use the ibverbs-smp build.

The multicore builds are pretty straightforward to run, the ibverbs-smp ones much less so. I ended up writing some bash
functions to make things easier for the end users [1]. Those helper functions end up calling namd as follows:

- Multicore build for single node runs: namd2 +p${SLURM_NTASKS} inputfile
- Ibverbs-smp build for multi-node runs: charmrun `which namd2` ++p ${SLURM_NTASKS} ++ppn ${SLURM_CPUS_ON_NODE}
++nodelist nodelist inputfile

where nodelist was generated from the contents of ${SLURM_NODELIST}. I got pretty good scaling on a 3M atom system up to
nearly 40 nodes.

[1] https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD#HPCSampleJob:NAMD-Appendix:contentsofnamd_functions

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University
> On May 25, 2020, at 3:56 AM, Gerald Keller <gerald.keller_at_uni-wuerzburg.de> wrote:
> 
> Hi all, 
> 
> I'm usually running NAMD on local single nodes in our working group. Since we are at the limit of our resources I'd
like to run NAMD simulations on a HPC. 
> On the desired HPC, NAMD is not preinstalled / can't be loaded as a module, SLURM and infiniband are available. I'm
not using GPUs. 
> 
> Now I'm not really sure how to get started. 
> I downloaded the precompiled version of ibverbs and ibverbs-smp of NAMD 2.13, respectively and tried to do a test run.
> 
> Then I only observe errors: 
> 
> Assertion "num_devices > 0" failed in file machine-ibverbs.c line 482 and that the tasks have segmentation fault. 
> I have run the test simulation with: 
> 
> srun -N 1 -n 3 -p partition_name namd2 conffile
> 
> Could anybody point me into the right direction? Do I have to compile NAMD for the HPC on my own? Am I running
completely wrong parameters regarding to srun? 
> 
> Best regards, 
> Gerald

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST