Re: verbs-smp on a single node

From: Jeff Comer (jeffcomer_at_gmail.com)
Date: Mon May 01 2017 - 11:22:48 CDT

Next message: yjcoshc_at_gmail.com: "Re: Free energy calculation under external bias, (e)ABF/US/etc"
Previous message: Giacomo Fiorin: "Re: Free energy calculation under external bias, (e)ABF/US/etc"
In reply to: Jeff Comer: "verbs-smp on a single node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

In case this is a help to anyone, I figured out how to get multiple-copy
algorithms and CUDA on a single node by avoiding the SMP builds and using
the netlrts Charm++ configuration. There is a warning that the performance
will not be good. However, at least in my case, the performance using
netltrs with CUDA is about 75% of the performance I get for 3 simultaneous
independent runs using the official NAMD 2.12 multicore-CUDA build, which I
consider pretty good performance. Furthermore, I'm still getting 2.5 times
the performance with my netlrts-CUDA build than with the official netlrts
build (without CUDA).

3 independent multicore-CUDA: 150 ns/day/replica
3 multiple-copy netlrts-CUDA: 111 ns/day/replica
3 multiple-copy netlrts without CUDA: 42 ns/day/replica.

Here is my procedure. It assumes that I have charm-6.7.1 in
$HOME/Software/charm-6.7.1/ and previous builds of fftw, tcl, and
tcl-threaded in $HOME/Software/NAMD_2.10b1_Source.

# Build charm++
cd $HOME/Software/charm-6.7.1/
./build charm++ netlrts-linux-x86_64 --with-production

# Build NAMD
cd $HOME/charm-6.7.1/Software/NAMD_2.12_Source/
ln -fs $HOME/Software/NAMD_2.10b1_Source/fftw
ln -fs $HOME/Software/NAMD_2.10b1_Source/tcl-threaded
ln -fs $HOME/Software/NAMD_2.10b1_Source/tcl
echo "CHARMBASE = $HOME/Software/charm-6.7.1" > Make.charm
# Edit ./config and Remove "exit 1" from the error "Consider ibverbs-smp or
verbs-smp (InfiniBand), gni-smp (Cray), or multicore (single node)."
./config Linux-x86_64-g++ --cuda-prefix /usr/local/cuda-8.0 --with-cuda
--charm-arch netlrts-linux-x86_64
cd Linux-x86_64-g++
make -j6

# Run NAMD with three replicas on 12 cores and 3 GPUs
namd=$HOME/Software/NAMD_2.12_Source/Linux-x86_64-g++/namd2
charm=$HOME/Software/NAMD_2.12_Source/Linux-x86_64-g++/charmrun
f=sabf295_rmsd_HB4_wvsing.0.namd
rm -f ${f%.*}.*.log
$charm $namd +idlepoll ++verbose ++local ++ppn 12 +p 12 +replicas 3
+devicesperreplica 1 $f +stdout ${f%.*}.%d.log & disown

It should be noted that "nvidia-smi" shows me that there are 4 processes
per GPU, which is probably why there is the warning about performance.
However, if I reduce to 2 processes per replica, the performance is reduced
to 0.66 of the performance with 4 processes per replica (111 ns/day/replica
to 74 ns/day/replica).

–––––––––––––––––––––––––––––––––––———————
Jeffrey Comer, PhD
Assistant Professor
Institute of Computational Comparative Medicine
Nanotechnology Innovation Center of Kansas State
Kansas State University
Office: P-213 Mosier Hall
Phone: 785-532-6311 <%28785%29%20532-6311>
Website: http://jeffcomer.us

On Wed, Apr 26, 2017 at 3:13 PM, Jeff Comer <jeffcomer_at_gmail.com> wrote:

> My goal is to use multiple-copy algorithms and CUDA simultaneously on
> a workstation running Ubuntu linux. However, I can't seem to get the
> verbs-smp-CUDA or verbs-smp builds of NAMD to work. For simplicity,
> let's not talk about CUDA and just look at verbs-smp. I want to run 2
> replicas on a 6-core machine. I do the following:
>
> namd=$HOME/Software/NAMD_2.12_Linux-x86_64-verbs-smp-CUDA/namd2
> charm=$HOME/Software/NAMD_2.12_Linux-x86_64-verbs-smp-CUDA/charmrun
> f=sabf_graph_wvsing.0.namd
> $charm $namd ++verbose ++local ++ppn 6 +p 6 +pemap 0-5 +commap 2,5
> +replicas 2 $f +stdout ${f%.*}.%d.log
>
> and get these messages:
>
> Charmrun> charmrun started...
> Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
> Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
> Charmrun> Charmrun = 127.0.0.1, port = 34440
> Charmrun> IBVERBS version of charmrun
> Charmrun> start 0 node program on localhost.
> Charmrun> node programs all started
> Charmrun> Waiting for 0-th client to connect.
> Charmrun> error attaching to node '127.0.0.1':
> Socket closed before recv.
>
> I tried installing some ibverbs packages, but that didn't seem to fix
> the problem. Any ideas?
>
> Thanks,
> Jeff
>
> –––––––––––––––––––––––––––––––––––———————
> Jeffrey Comer, PhD
> Assistant Professor
> Institute of Computational Comparative Medicine
> Nanotechnology Innovation Center of Kansas State
> Kansas State University
> Office: P-213 Mosier Hall
> Phone: 785-532-6311 <%28785%29%20532-6311>
> Website: http://jeffcomer.us
>
>

Next message: yjcoshc_at_gmail.com: "Re: Free energy calculation under external bias, (e)ABF/US/etc"
Previous message: Giacomo Fiorin: "Re: Free energy calculation under external bias, (e)ABF/US/etc"
In reply to: Jeff Comer: "verbs-smp on a single node"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:15 CST