Re: Re: Setting up / Running NAMD on HPC

From: Josh Vermaas (joshua.vermaas_at_gmail.com)
Date: Mon May 25 2020 - 16:44:10 CDT

In my experience, for the smoothest installation experience and
interoperability with niceties like srun and ibrun, you either pretend that
you are running charmrun through MPI (see
https://charm.readthedocs.io/en/latest/charm++/manual.html#mpiexec), or you
build NAMD with MPI bindings. This lets you piggyback on the experience the
HPC folks have in running MPI executables, with the downside that MPI
builds with NAMD have poor GPU performance characteristics (not a problem
for you). Its been a while, but I *think* this just means you'd add
the ++mpiexec flag to your runline.

srun -N 1 -n 3 -p partition_name namd2 ++mpiexec conffile

What I typically do though is just compile NAMD with an MPI backend, rather
than depending on ibverbs. Then you'd run NAMD like any other MPI
executable, and most places are set up to let you do this with a minimum of
fuss. This is what I did at one point to get this running on peregrine, a
mostly-CPU system that I used when I was at NREL:

#Commands to build NAMD on Peregrine, Sept. 2018
#Notes by Josh have "#" symbols in front of them. Actual commands are in
plain text.
#Get NAMD source. Note that it may be faster to download this on your own
machine and scp it over..
#Then unzip/tar it.
tar -zxf NAMD_2.13_Source.tar.gz

cd NAMD_2.13_Source
tar -xf charm-6.8.2.tar
cd charm-6.8.2
#These may change depending on what you want your compiler and MPI backend
to be. gcc and openMPI worked for me. See what your own institution has.
module load gcc/7.2.0
module load openmpi-gcc/2.1.2-7.2.0

#Now start the charmc build.
#This is interactive, at lets you pick and choose options. For MPI, you
want to let it use the mpicc it finds in the path
#./build
#The equivalent options are here:
./build charm++ mpi-linux-x86_64 mpicxx -j8

cd ..
#These are the "default". You can also rejigger the build so that it uses a
precompiled fftw or a fftw3 library.
wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-linux-x86_64.tar.gz
tar xzf fftw-linux-x86_64.tar.gz
mv linux-x86_64 fftw
wget
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64.tar.gz
wget
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz
tar xzf tcl8.5.9-linux-x86_64.tar.gz
tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
mv tcl8.5.9-linux-x86_64 tcl
mv tcl8.5.9-linux-x86_64-threaded tcl-threaded

#charm-arch is *usually* set by configuration files within the "arch"
directory. None of them think you are using mpi,
#so this needs to be set explicitly with the --charm-arch command
./config Linux-x86_64-g++ --charm-arch mpi-linux-x86_64-mpicxx

#Compile NAMD.
cd Linux-x86_64-g++
make -j8

#Now all the binaries should be built. Move them to wherever you want!

On Mon, May 25, 2020, 1:36 PM Gerald Keller <gerald.keller_at_uni-wuerzburg.de>
wrote:

> Hi Michael,
>
> thanks four your reply! I think this really pushed me further.
>
> I used your posted function to generate the nodelist and started namd like
> you mentioned for ibverb-smp.
>
> Now Charmrun seems not to be able to connect to the node:
>
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 1 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 2 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Reconnection attempt 3 of 3
> Permission denied (publickey,password).
> Charmrun> Error 255 returned from remote shell (devel-01.cluster.local:0)
> Charmrun> Too many reconnection attempts; bailing out
>
> Is there anything I could do at this point, or do I have to contact the
> HPC administrators to find a solution?
>
> Best,
> Gerald
>
>
> >>> "Renfro, Michael" <Renfro_at_tntech.edu> 25.05.20 16.40 Uhr >>>
> Not a NAMD expert, but I've had it installed in our HPC since 2017. Here’s
> what I’d suggest, and someone else may come along to correct me:
>
> - for runs on a single node, we use the the multicore build, which is
> probably what you were using on your single nodes previously.
> - for runs on multiple nodes with Infiniband, we use the ibverbs-smp build.
>
> The multicore builds are pretty straightforward to run, the ibverbs-smp
> ones much less so. I ended up writing some bash functions to make things
> easier for the end users [1]. Those helper functions end up calling namd as
> follows:
>
> - Multicore build for single node runs: namd2 +p${SLURM_NTASKS} inputfile
> - Ibverbs-smp build for multi-node runs: charmrun `which namd2` ++p
> ${SLURM_NTASKS} ++ppn ${SLURM_CPUS_ON_NODE} ++nodelist nodelist inputfile
>
> where nodelist was generated from the contents of ${SLURM_NODELIST}. I got
> pretty good scaling on a 3M atom system up to nearly 40 nodes.
>
> [1]
> https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD#HPCSampleJob:NAMD-Appendix:contentsofnamd_functions
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__its.tntech.edu_display_MON_HPC-2BSample-2BJob-253A-2BNAMD-23HPCSampleJob-3ANAMD-2DAppendix-3Acontentsofnamd-5Ffunctions&d=DwQFaQ&c=OCIEmEwdEq_aNlsP4fF3gFqSN-E3mlr2t9JcDdfOZag&r=jUfnSyKkfkyVRBIUzlG1GSGGZAZGcznwr8YliSSCjPc&m=W5XzaqfuL-TVqkfeN3Hpfh-LZn3ub-Mp-Bag58nfhaQ&s=RUu-qWGd8vHAdRETt5oRmUmKh5SkQR5f0K4WE-iNDeo&e=>
>
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601 / Tennessee Tech University
>
> > On May 25, 2020, at 3:56 AM, Gerald Keller <
> gerald.keller_at_uni-wuerzburg.de> wrote:
> >
> > Hi all,
> >
> > I'm usually running NAMD on local single nodes in our working group.
> Since we are at the limit of our resources I'd like to run NAMD simulations
> on a HPC.
> > On the desired HPC, NAMD is not preinstalled / can't be loaded as a
> module, SLURM and infiniband are available. I'm not using GPUs.
> >
> > Now I'm not really sure how to get started.
> > I downloaded the precompiled version of ibverbs and ibverbs-smp of NAMD
> 2.13, respectively and tried to do a test run.
> >
> > Then I only observe errors:
> >
> > Assertion "num_devices > 0" failed in file machine-ibverbs.c line 482
> and that the tasks have segmentation fault.
> > I have run the test simulation with:
> >
> > srun -N 1 -n 3 -p partition_name namd2 conffile
> >
> > Could anybody point me into the right direction? Do I have to compile
> NAMD for the HPC on my own? Am I running completely wrong parameters
> regarding to srun?
> >
> > Best regards,
> > Gerald
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST