Re: question to run namd2 with multiple CPUs

From: Josh Vermaas (joshua.vermaas_at_gmail.com)
Date: Mon Mar 02 2020 - 15:30:03 CST

Hi Tao,

If I compile NAMD appropriately (MPI backend), I find that I can just use
the usual mpirun to run NAMD, and its a bunch less tricky to setup than
charmrun. The exception is that if you want to run on GPUs, you can't have
an MPI backend and still get good performance, which is why that option is
disabled. Here is what I used to compile on CPU-based systems:

tar -zxf NAMD_2.13_Source.tar.gz
cd NAMD_2.13_Source
tar -xf charm-6.8.2.tar
cd charm-6.8.2
#Now start the charmc build.
#This is interactive, at lets you pick and choose options. For MPI, you
want to let it use the mpicc it finds in the path
#./build
#The equivalent options are here:
./build charm++ mpi-linux-x86_64 mpicxx -j8
cd ..

#Download fftw/tcl libraries and move them. This is optional, but you have
to change the arch files appropriately if you have your libraries in a
different location than is the default. I always do.
tar xzf fftw-linux-x86_64.tar.gz
mv linux-x86_64 fftw
wget
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64.tar.gz
wget
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz
tar xzf tcl8.5.9-linux-x86_64.tar.gz
tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
mv tcl8.5.9-linux-x86_64 tcl
mv tcl8.5.9-linux-x86_64-threaded tcl-threaded

./config Linux-x86_64-g++ --charm-arch mpi-linux-x86_64-mpicxx
#Compile NAMD.
cd Linux-x86_64-g++
make -j8
#Now all the binaries should be built. Move them to wherever you want!

-Josh

On Mon, Mar 2, 2020 at 4:08 PM Yu, Tao <tao.yu.1_at_und.edu> wrote:

> Josh,
>
> Thank you so much!
>
> After module load mpi and openmpi, we tried using
>
> /share/apps/namd/namd2.13/charmrun +p16 ++mpiexec ++remote-shell mpirun /share/apps/namd/namd2.13/namd2 pc.conf
>
> ,
>
> but still did not work.
>
> Here is the error we got in the output file.
>
> Info: *****************************
> Info: Reading from binary file 300.restart.coor
> Info:
> Info: Entering startup at 8.05077 s, 222.184 MB of memory in use
> Info: Startup phase 0 took 0.000228167 s, 222.184 MB of memory in use
> [0] wc[0] status 9 wc[i].opcode 0
> [0] Stack Traceback:
> [0:0] [0x145023d]
> [0:1] [0x143c4fd]
> [0:2] [0x1462fb2]
> [0:3] [0x1458f82]
> [0:4] [0x53a5e0]
> [0:5] [0xcfa97e]
> [0:6] [0xed3ca0]
> [0:7] [0xd77f12]
> [0:8] [0xd77581]
> [0:9] [0x127f077]
> [0:10] [0x1459735]
> [0:11] [0xea5fcf]
> [0:12] TclInvokeStringCommand+0x88 [0x14b5338]
> [0:13] [0x14b7f57]
> [0:14] [0x14b9372]
> [0:15] [0x14b9b96]
> [0:16] [0x151bd41]
> [0:17] [0x151befe]
> [0:18] [0xe99ad0]
> [0:19] [0x517421]
> [0:20] __libc_start_main+0xf5 [0x2aaaabbc8545]
> [0:21] _ZNSt8ios_base4InitD1Ev+0x61 [0x40fc69]
> "pc.log" 186L, 6779C 186,3 Bot
>
>
> I think the issue is that our local cluster does not have a correct environment to use mpiexec.
>
> Any thoughts on this?
>
>
> Meanwhile, our cluster manager is plan to compile the source code. My question is that after compiling the source code, to run namd2, should I use mpirun or charm run?
>
>
>
>
> Best,
>
>
> Tao
>
> ------------------------------
> *From:* Josh Vermaas <joshua.vermaas_at_gmail.com>
> *Sent:* Friday, February 28, 2020 4:29 PM
> *To:* namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>; Yu, Tao <tao.yu.1_at_und.edu>
> *Subject:* Re: namd-l: question to run namd2 with multiple CPUs
>
> Hi Tao,
>
> The default mpirun with no arguments will only start 1 mpi rank, so that's
> at least part of the problem. The other part of the problem is that the smp
> builds don't actually use mpi at all, and because of that they need to be
> started with charmrun. See
> https://www.ks.uiuc.edu/Research/namd/2.13/ug/node98.html for more
> information.
>
> I *think* what you will settle on is something like this:
>
> /share/apps/namd/namd2.13/charmrun +p16 ++mpiexec ++remote-shell mpirun /share/apps/namd/namd2.13/namd2 pc.conf
>
> Josh
>
> On Fri, Feb 28, 2020, 5:17 PM Yu, Tao <tao.yu.1_at_und.edu> wrote:
>
> Hi,
>
> Our university just installed the Linux-x86_64-ibverbs-smp
> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1546>
> (InfiniBand plus shared memory, no MPI needed) on our local cluster.
> There is no problem to run with a single core. But when I started to test
> running with 16 cores, the output indicated it was still using "1" cpu.
>
> I tried to submit the job either with or without mpirun, but the result is
> the same. One 1 cpu was used.
>
> Please give me some help here.
>
> The script I was using was attached in below
>
> #!/bin/bash
> #####Number of nodes
> #SBATCH --nodes=1
> #SBATCH --partition=talon
> #SBATCH --ntasks-per-node=16
> #SBATCH --workdir=.
> #####SBATCH -o slurm_run_%j_output.txt
> #####SBATCH -e slurm_run_%j_error.txt
> #SBATCH -s
> #SBATCH --time=00:10:00
>
> cd $SLURM_SUBMIT_DIR
> srun -n $SLURM_NTASKS hostname | sort -u > $SLURM_JOB_ID.hosts
>
> module load intel/mpi/64
>
> mpirun /share/apps/namd/namd2.13/namd2 pc.conf > pc.log
> **************************************************************************
>
> The output where indicated "1" cpu was using:
>
> ENERGY: 0 482.9967 1279.0637 1052.3094
> 78.2784 -71938.7727 7732.6846 0.0000 0.0000
> 11869.2504 -49444.1894 298.3828 -61313.4398
> -49399.0775 298.3828 79.3838 81.2849
> 193787.4555 79.3838 81.2849
>
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> LDB: ============= START OF LOAD BALANCING ============== 2.06838
> LDB: ============== END OF LOAD BALANCING =============== 2.06856
> LDB: =============== DONE WITH MIGRATION ================ 2.06971
> LDB: ============= START OF LOAD BALANCING ============== 8.05131
> LDB: ============== END OF LOAD BALANCING =============== 8.05246
> LDB: =============== DONE WITH MIGRATION ================ 8.05289
> Info: Initial time: 1 CPUs 0.150798 s/step 0.872672 days/ns 236.934 MB
> memory
> LDB: ============= START OF LOAD BALANCING ============== 9.55101
> LDB: ============== END OF LOAD BALANCING =============== 9.55108
> LDB: =============== DONE WITH MIGRATION ================ 9.5515
> LDB: ============= START OF LOAD BALANCING ============== 15.3206
> LDB: ============== END OF LOAD BALANCING =============== 15.3217
> LDB: =============== DONE WITH MIGRATION ================ 15.3221
> Info: Initial time: 1 CPUs 0.144713 s/step 0.837461 days/ns 237.516 MB
> memory
> LDB: ============= START OF LOAD BALANCING ============== 22.4754
> LDB: ============== END OF LOAD BALANCING =============== 22.4766
> LDB: =============== DONE WITH MIGRATION ================ 22.477
> Info: Initial time: 1 CPUs 0.143573 s/step 0.830862 days/ns 237.516 MB
> memory
>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST