Re: pre compiled charmm-6.8.2 for namd2.13 nightly version compilation for multiple GPU node simulations

From: Aravinda Munasinghe (aravinda1879_at_gmail.com)
Date: Wed Jan 30 2019 - 17:17:56 CST

Dear Jim,
Thank you very much for your suggestions.
1) for charmrun it required (ldd charmrun)
linux-vdso.so.1 => (0x00007ffedaacf000)
libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
(0x00002b50e84af000)
libm.so.6 => /lib64/libm.so.6 (0x00002b50e883f000)
libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
(0x00002b50e8b47000)
libc.so.6 => /lib64/libc.so.6 (0x00002b50e8d5f000)
/lib64/ld-linux-x86-64.so.2 (0x00002b50e8287000)
for hello
linux-vdso.so.1 => (0x00007ffd0c54f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae63379f000)
libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00002ae6339bf000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ae633bd7000)
libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
(0x00002ae633ddf000)
libm.so.6 => /lib64/libm.so.6 (0x00002ae63416f000)
libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
(0x00002ae634477000)
libc.so.6 => /lib64/libc.so.6 (0x00002ae63468f000)
/lib64/ld-linux-x86-64.so.2 (0x00002ae633577000)
libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00002ae634a5f000)
libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00002ae634ccf000)

(2) I tried with netlrts (./build charm++ netlrts-linux-x86_64 smp
--with-production)
But still got the same set of error. (Charmrun> Waiting for 0-th client to
connect.)

(3) after run make in tests/charm++/simplearrayhello I run the following
command.
./charmrun ./hello ++verbose ++nodelist nodelist.31445455
And nodelist file includes following

group main
host login1-ib
host login2-ib
host login3-ib

However, when I tried charmm without the smp build, it actually works
perfectly. But that charmm architecture with namd Linux-x86_64-g++ build
did not support REMD (+deviceperreplica)
Thank you,
Best,
Aravinda Munasinghe

On Wed, Jan 30, 2019 at 5:34 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:

>
> A few suggestions:
>
> 1) Run ldd verbs-linux-x86_64-smp/tests/charm++/simplearrayhello so you
> can see what shared libraries it needs.
>
> 2) Test the netlrts version to be sure your problem is not related to the
> InfiniBand verbs library.
>
> 3) Show the actual command you are using to run and use ++verbose.
>
> Jim
>
>
> On Tue, 29 Jan 2019, Aravinda Munasinghe wrote:
>
> > Hi Josh,
> > Thank you very much for your reply. There was no specific reason for
> using
> > intel compilers. As per your suggestion, I did try without icc ( and also
> > with iccstatic). And still fails to run charmrun. Compilation do get
> > completed with
> >
> > charm++ built successfully.
> > Next, try out a sample program like
> > verbs-linux-x86_64-smp/tests/charm++/simplearrayhello
> >
> > But, when I try to run hello executable with charmrun I get the following
> > error,
> >
> > Charmrun> remote shell (localhost:0) started
> > Charmrun> node programs all started
> > Charmrun remote shell(localhost.0)> remote responding...
> > Charmrun remote shell(localhost.0)> starting node-program...
> > Charmrun remote shell(localhost.0)> remote shell phase successful.
> > Charmrun> Waiting for 0-th client to connect.
> > Charmrun> error attaching to node 'localhost':
> > Timeout waiting for node-program to connect
> >
> > This is the same error I kept getting all this time when I try to compile
> > it by my self. Only thing I cannot figure is how come precompiled version
> > works perfectly, but when I try to build from scratch it never works.
> > Any thoughts on this?
> > Best,
> > AM
> >
> >
> > On Tue, Jan 29, 2019 at 12:42 PM Vermaas, Joshua <
> Joshua.Vermaas_at_nrel.gov>
> > wrote:
> >
> >> Hi Aravinda,
> >>
> >> Any particular reason you want to use the intel compilers? Since your
> goal
> >> is to use CUDA anyway, and the integration between the CUDA toolkit and
> the
> >> intel compilers tends to be hit or miss depending on the machine, I'd
> try
> >> the GNU compilers first (just drop the icc from the build line). If you
> can
> >> get that working, then you can spend a bit more time debugging exactly
> what
> >> your error messages mean. It could just be as simple as using iccstatic
> >> instead of icc, so that the libraries are bundled into the executable at
> >> compile time, which would solve your LD_LIBRARY_PATH issues.
> >>
> >> -Josh
> >>
> >>
> >>
> >> On 2019-01-29 09:42:41-07:00 owner-namd-l_at_ks.uiuc.edu wrote:
> >>
> >> Dear NAMD users and developers,
> >> I have recently attempted to compile namd2.13 nightly build to run
> >> multiple GPU node replica exchange simulations using REST2 methodology.
> >> First, I was able to run the current version of namd 2.13
> >> Linux-x86_64-verbs-smp-CUDA (Multi-copy algorithms on InfiniBand)
> binaries
> >> with charmrun in our university cluster using multiple node/GPU setup
> (with
> >> slurm).
> >> Then, I tried compiling namd 2.13 nightly version to use REST2 (since
> the
> >> current version have a bug with selecting solute atom IDs as told here -
> >>
> https://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2018-2019/1424.html
> >> ), with information in NVIDIA site as well as what mentioned in the
> >> release note. But I failed my self miserably as several others had ( as
> I
> >> can see from the mailing thread). Since the precompiled binaries within
> the
> >> current version work perfectly, I cannot think of a reason why my
> attempts
> >> failed other than some issue related to library files and compilers I
> am
> >> loading when building charm for multiple node GPU setup. I have used
> >> following flags to build the charmm.
> >> *./build charm++ verbs-linux-x86_64 icc smp --with-production *
> >> I have used ifort and Intel/2018 compilers.
> >> One thing I have noticed is that when I use precompiled namd2.13 I did
> not
> >> have to link LD_LIBRARY_PATH. But I had to do so when I compiled it my
> >> self (otherwise I keep getting missing library files error).
> >> It would be a great help if any of you who have successfully compiled
> >> multiple node GPU namd 2.13 could share your charmm--6.8.2 files along
> with
> >> information on compilers you used, so I could compile namd by my self.
> Or
> >> any sort of advice on how to solve this or sharing namd2.13 precompiled
> >> binaries for the nightly version itself is highly appreciated.
> >> Thank you,
> >> Best,
> >> --
> >> Aravinda Munasinghe,
> >>
> >>
> >
> > --
> > Aravinda Munasinghe,
> >
>

-- 
Aravinda Munasinghe,

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:09 CST