Re: pre compiled charmm-6.8.2 for namd2.13 nightly version compilation for multiple GPU node simulations

From: Aravinda Munasinghe (aravinda1879_at_gmail.com)
Date: Wed Jan 30 2019 - 20:18:13 CST

Dear Jim and Vermass,

I finally figured out what's going on. Apparently, loading intel modules (I
was trying to compile in the university cluster, which uses slurm) before
compiling charmm - even when I just using GNU compilers messed up the whole
charmm compilation process. As per Joshua's first recommendation, for some
reason intel mess up the charmm compilation process. For now, this helped
me successfully compile namd-smp-cuda version.

For those who get this error, this may be a solution. Just make sure you
not to load intel modules (if you are compiling this in a cluster) when
building charmm with smp for multiple node GPU namd calculations.
Thanks all for your help and guidance,
Best,
Aravinda Munsinghe

On Wed, Jan 30, 2019 at 6:17 PM Aravinda Munasinghe <aravinda1879_at_gmail.com>
wrote:

> Dear Jim,
> Thank you very much for your suggestions.
> 1) for charmrun it required (ldd charmrun)
> linux-vdso.so.1 => (0x00007ffedaacf000)
> libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
> (0x00002b50e84af000)
> libm.so.6 => /lib64/libm.so.6 (0x00002b50e883f000)
> libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
> (0x00002b50e8b47000)
> libc.so.6 => /lib64/libc.so.6 (0x00002b50e8d5f000)
> /lib64/ld-linux-x86-64.so.2 (0x00002b50e8287000)
> for hello
> linux-vdso.so.1 => (0x00007ffd0c54f000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae63379f000)
> libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00002ae6339bf000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00002ae633bd7000)
> libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
> (0x00002ae633ddf000)
> libm.so.6 => /lib64/libm.so.6 (0x00002ae63416f000)
> libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
> (0x00002ae634477000)
> libc.so.6 => /lib64/libc.so.6 (0x00002ae63468f000)
> /lib64/ld-linux-x86-64.so.2 (0x00002ae633577000)
> libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00002ae634a5f000)
> libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00002ae634ccf000)
>
> (2) I tried with netlrts (./build charm++ netlrts-linux-x86_64 smp
> --with-production)
> But still got the same set of error. (Charmrun> Waiting for 0-th client to
> connect.)
>
> (3) after run make in tests/charm++/simplearrayhello I run the following
> command.
> ./charmrun ./hello ++verbose ++nodelist nodelist.31445455
> And nodelist file includes following
>
> group main
> host login1-ib
> host login2-ib
> host login3-ib
>
> However, when I tried charmm without the smp build, it actually works
> perfectly. But that charmm architecture with namd Linux-x86_64-g++ build
> did not support REMD (+deviceperreplica)
> Thank you,
> Best,
> Aravinda Munasinghe
>
> On Wed, Jan 30, 2019 at 5:34 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>
>>
>> A few suggestions:
>>
>> 1) Run ldd verbs-linux-x86_64-smp/tests/charm++/simplearrayhello so you
>> can see what shared libraries it needs.
>>
>> 2) Test the netlrts version to be sure your problem is not related to the
>> InfiniBand verbs library.
>>
>> 3) Show the actual command you are using to run and use ++verbose.
>>
>> Jim
>>
>>
>> On Tue, 29 Jan 2019, Aravinda Munasinghe wrote:
>>
>> > Hi Josh,
>> > Thank you very much for your reply. There was no specific reason for
>> using
>> > intel compilers. As per your suggestion, I did try without icc ( and
>> also
>> > with iccstatic). And still fails to run charmrun. Compilation do get
>> > completed with
>> >
>> > charm++ built successfully.
>> > Next, try out a sample program like
>> > verbs-linux-x86_64-smp/tests/charm++/simplearrayhello
>> >
>> > But, when I try to run hello executable with charmrun I get the
>> following
>> > error,
>> >
>> > Charmrun> remote shell (localhost:0) started
>> > Charmrun> node programs all started
>> > Charmrun remote shell(localhost.0)> remote responding...
>> > Charmrun remote shell(localhost.0)> starting node-program...
>> > Charmrun remote shell(localhost.0)> remote shell phase successful.
>> > Charmrun> Waiting for 0-th client to connect.
>> > Charmrun> error attaching to node 'localhost':
>> > Timeout waiting for node-program to connect
>> >
>> > This is the same error I kept getting all this time when I try to
>> compile
>> > it by my self. Only thing I cannot figure is how come precompiled
>> version
>> > works perfectly, but when I try to build from scratch it never works.
>> > Any thoughts on this?
>> > Best,
>> > AM
>> >
>> >
>> > On Tue, Jan 29, 2019 at 12:42 PM Vermaas, Joshua <
>> Joshua.Vermaas_at_nrel.gov>
>> > wrote:
>> >
>> >> Hi Aravinda,
>> >>
>> >> Any particular reason you want to use the intel compilers? Since your
>> goal
>> >> is to use CUDA anyway, and the integration between the CUDA toolkit
>> and the
>> >> intel compilers tends to be hit or miss depending on the machine, I'd
>> try
>> >> the GNU compilers first (just drop the icc from the build line). If
>> you can
>> >> get that working, then you can spend a bit more time debugging exactly
>> what
>> >> your error messages mean. It could just be as simple as using iccstatic
>> >> instead of icc, so that the libraries are bundled into the executable
>> at
>> >> compile time, which would solve your LD_LIBRARY_PATH issues.
>> >>
>> >> -Josh
>> >>
>> >>
>> >>
>> >> On 2019-01-29 09:42:41-07:00 owner-namd-l_at_ks.uiuc.edu wrote:
>> >>
>> >> Dear NAMD users and developers,
>> >> I have recently attempted to compile namd2.13 nightly build to run
>> >> multiple GPU node replica exchange simulations using REST2 methodology.
>> >> First, I was able to run the current version of namd 2.13
>> >> Linux-x86_64-verbs-smp-CUDA (Multi-copy algorithms on InfiniBand)
>> binaries
>> >> with charmrun in our university cluster using multiple node/GPU setup
>> (with
>> >> slurm).
>> >> Then, I tried compiling namd 2.13 nightly version to use REST2 (since
>> the
>> >> current version have a bug with selecting solute atom IDs as told here
>> -
>> >>
>> https://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2018-2019/1424.html
>> >> ), with information in NVIDIA site as well as what mentioned in the
>> >> release note. But I failed my self miserably as several others had (
>> as I
>> >> can see from the mailing thread). Since the precompiled binaries
>> within the
>> >> current version work perfectly, I cannot think of a reason why my
>> attempts
>> >> failed other than some issue related to library files and compilers I
>> am
>> >> loading when building charm for multiple node GPU setup. I have used
>> >> following flags to build the charmm.
>> >> *./build charm++ verbs-linux-x86_64 icc smp --with-production *
>> >> I have used ifort and Intel/2018 compilers.
>> >> One thing I have noticed is that when I use precompiled namd2.13 I did
>> not
>> >> have to link LD_LIBRARY_PATH. But I had to do so when I compiled it my
>> >> self (otherwise I keep getting missing library files error).
>> >> It would be a great help if any of you who have successfully compiled
>> >> multiple node GPU namd 2.13 could share your charmm--6.8.2 files along
>> with
>> >> information on compilers you used, so I could compile namd by my self.
>> Or
>> >> any sort of advice on how to solve this or sharing namd2.13 precompiled
>> >> binaries for the nightly version itself is highly appreciated.
>> >> Thank you,
>> >> Best,
>> >> --
>> >> Aravinda Munasinghe,
>> >>
>> >>
>> >
>> > --
>> > Aravinda Munasinghe,
>> >
>>
>
>
> --
> Aravinda Munasinghe,
>
>

-- 
Aravinda Munasinghe,

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:28 CST