Re: Re: Setting up / Running NAMD on HPC

From: Victor Kwan (vkwan8_at_uwo.ca)
Date: Mon May 25 2020 - 16:54:55 CDT

Dear Gerald,

Your first point of contact should always be your HPC administrators. They
are being paid to maintain the HPC, and to be frank, it is their
responsibility to install and test software upon request.

>From what you posted, it looks like charmmrun can't ssh into the individual
nodes - you will have to contact your system administrator.

On Mon, May 25, 2020 at 3:35 PM Gerald Keller <
gerald.keller_at_uni-wuerzburg.de> wrote:

> Hi Michael,
>
> thanks four your reply! I think this really pushed me further.
>
> I used your posted function to generate the nodelist and started namd like
> you mentioned for ibverb-smp.
>
> Now Charmrun seems not to be able to connect to the node:
>
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 1 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 2 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 3 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Too many reconnection attempts; bailing out
>
> Is there anything I could do at this point, or do I have to contact the
> HPC administrators to find a solution?
>
> Best,
> Gerald
>
>
> >>> "Renfro, Michael" <Renfro_at_tntech.edu> 25.05.20 16.40 Uhr >>>
> Not a NAMD expert, but I've had it installed in our HPC since 2017. Here’s
> what I’d suggest, and someone else may come along to correct me:
>
> - for runs on a single node, we use the the multicore build, which is
> probably what you were using on your single nodes previously.
> - for runs on multiple nodes with Infiniband, we use the ibverbs-smp build.
>
> The multicore builds are pretty straightforward to run, the ibverbs-smp
> ones much less so. I ended up writing some bash functions to make things
> easier for the end users [1]. Those helper functions end up calling namd as
> follows:
>
> - Multicore build for single node runs: namd2 +p${SLURM_NTASKS} inputfile
> - Ibverbs-smp build for multi-node runs: charmrun `which namd2` ++p
> ${SLURM_NTASKS} ++ppn ${SLURM_CPUS_ON_NODE} ++nodelist nodelist inputfile
>
> where nodelist was generated from the contents of ${SLURM_NODELIST}. I got
> pretty good scaling on a 3M atom system up to nearly 40 nodes.
>
> [1]
> https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD#HPCSampleJob:NAMD-Appendix:contentsofnamd_functions
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__its.tntech.edu_display_MON_HPC-2BSample-2BJob-253A-2BNAMD-23HPCSampleJob-3ANAMD-2DAppendix-3Acontentsofnamd-5Ffunctions&d=DwQFaQ&c=OCIEmEwdEq_aNlsP4fF3gFqSN-E3mlr2t9JcDdfOZag&r=jUfnSyKkfkyVRBIUzlG1GSGGZAZGcznwr8YliSSCjPc&m=W5XzaqfuL-TVqkfeN3Hpfh-LZn3ub-Mp-Bag58nfhaQ&s=RUu-qWGd8vHAdRETt5oRmUmKh5SkQR5f0K4WE-iNDeo&e=>
>
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601 / Tennessee Tech University
>
> > On May 25, 2020, at 3:56 AM, Gerald Keller <
> gerald.keller_at_uni-wuerzburg.de> wrote:
> >
> > Hi all,
> >
> > I'm usually running NAMD on local single nodes in our working group.
> Since we are at the limit of our resources I'd like to run NAMD simulations
> on a HPC.
> > On the desired HPC, NAMD is not preinstalled / can't be loaded as a
> module, SLURM and infiniband are available. I'm not using GPUs.
> >
> > Now I'm not really sure how to get started.
> > I downloaded the precompiled version of ibverbs and ibverbs-smp of NAMD
> 2.13, respectively and tried to do a test run.
> >
> > Then I only observe errors:
> >
> > Assertion "num_devices > 0" failed in file machine-ibverbs.c line 482
> and that the tasks have segmentation fault.
> > I have run the test simulation with:
> >
> > srun -N 1 -n 3 -p partition_name namd2 conffile
> >
> > Could anybody point me into the right direction? Do I have to compile
> NAMD for the HPC on my own? Am I running completely wrong parameters
> regarding to srun?
> >
> > Best regards,
> > Gerald
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:09 CST