Re: pre compiled charmm-6.8.2 for namd2.13 nightly version compilation for multiple GPU node simulations

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Thu Jan 31 2019 - 08:50:46 CST

Hi Aravinda, your suggestion is probably specific to how the environment
modules are set up for the Intel compiler in your cluster.

In other clusters Intel compilers with charm++ will work OK, provided that
the same compiler that is loaded by the module is also being provided as
argument the charm build script. (i.e. don't load the Intel module and use
gcc to build charm++).

Giacomo

On Wed, Jan 30, 2019 at 9:20 PM Aravinda Munasinghe <aravinda1879_at_gmail.com>
wrote:

> Dear Jim and Vermass,
>
> I finally figured out what's going on. Apparently, loading intel modules
> (I was trying to compile in the university cluster, which uses slurm)
> before compiling charmm - even when I just using GNU compilers messed up
> the whole charmm compilation process. As per Joshua's first recommendation,
> for some reason intel mess up the charmm compilation process. For now, this
> helped me successfully compile namd-smp-cuda version.
>
> For those who get this error, this may be a solution. Just make sure you
> not to load intel modules (if you are compiling this in a cluster) when
> building charmm with smp for multiple node GPU namd calculations.
> Thanks all for your help and guidance,
> Best,
> Aravinda Munsinghe
>
> On Wed, Jan 30, 2019 at 6:17 PM Aravinda Munasinghe <
> aravinda1879_at_gmail.com> wrote:
>
>> Dear Jim,
>> Thank you very much for your suggestions.
>> 1) for charmrun it required (ldd charmrun)
>> linux-vdso.so.1 => (0x00007ffedaacf000)
>> libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
>> (0x00002b50e84af000)
>> libm.so.6 => /lib64/libm.so.6 (0x00002b50e883f000)
>> libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
>> (0x00002b50e8b47000)
>> libc.so.6 => /lib64/libc.so.6 (0x00002b50e8d5f000)
>> /lib64/ld-linux-x86-64.so.2 (0x00002b50e8287000)
>> for hello
>> linux-vdso.so.1 => (0x00007ffd0c54f000)
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae63379f000)
>> libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00002ae6339bf000)
>> libdl.so.2 => /lib64/libdl.so.2 (0x00002ae633bd7000)
>> libstdc++.so.6 => /apps/compilers/gcc/5.2.0/lib64/libstdc++.so.6
>> (0x00002ae633ddf000)
>> libm.so.6 => /lib64/libm.so.6 (0x00002ae63416f000)
>> libgcc_s.so.1 => /apps/compilers/gcc/5.2.0/lib64/libgcc_s.so.1
>> (0x00002ae634477000)
>> libc.so.6 => /lib64/libc.so.6 (0x00002ae63468f000)
>> /lib64/ld-linux-x86-64.so.2 (0x00002ae633577000)
>> libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00002ae634a5f000)
>> libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00002ae634ccf000)
>>
>> (2) I tried with netlrts (./build charm++ netlrts-linux-x86_64 smp
>> --with-production)
>> But still got the same set of error. (Charmrun> Waiting for 0-th client
>> to connect.)
>>
>> (3) after run make in tests/charm++/simplearrayhello I run the following
>> command.
>> ./charmrun ./hello ++verbose ++nodelist nodelist.31445455
>> And nodelist file includes following
>>
>> group main
>> host login1-ib
>> host login2-ib
>> host login3-ib
>>
>> However, when I tried charmm without the smp build, it actually works
>> perfectly. But that charmm architecture with namd Linux-x86_64-g++ build
>> did not support REMD (+deviceperreplica)
>> Thank you,
>> Best,
>> Aravinda Munasinghe
>>
>> On Wed, Jan 30, 2019 at 5:34 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
>>
>>>
>>> A few suggestions:
>>>
>>> 1) Run ldd verbs-linux-x86_64-smp/tests/charm++/simplearrayhello so you
>>> can see what shared libraries it needs.
>>>
>>> 2) Test the netlrts version to be sure your problem is not related to
>>> the
>>> InfiniBand verbs library.
>>>
>>> 3) Show the actual command you are using to run and use ++verbose.
>>>
>>> Jim
>>>
>>>
>>> On Tue, 29 Jan 2019, Aravinda Munasinghe wrote:
>>>
>>> > Hi Josh,
>>> > Thank you very much for your reply. There was no specific reason for
>>> using
>>> > intel compilers. As per your suggestion, I did try without icc ( and
>>> also
>>> > with iccstatic). And still fails to run charmrun. Compilation do get
>>> > completed with
>>> >
>>> > charm++ built successfully.
>>> > Next, try out a sample program like
>>> > verbs-linux-x86_64-smp/tests/charm++/simplearrayhello
>>> >
>>> > But, when I try to run hello executable with charmrun I get the
>>> following
>>> > error,
>>> >
>>> > Charmrun> remote shell (localhost:0) started
>>> > Charmrun> node programs all started
>>> > Charmrun remote shell(localhost.0)> remote responding...
>>> > Charmrun remote shell(localhost.0)> starting node-program...
>>> > Charmrun remote shell(localhost.0)> remote shell phase successful.
>>> > Charmrun> Waiting for 0-th client to connect.
>>> > Charmrun> error attaching to node 'localhost':
>>> > Timeout waiting for node-program to connect
>>> >
>>> > This is the same error I kept getting all this time when I try to
>>> compile
>>> > it by my self. Only thing I cannot figure is how come precompiled
>>> version
>>> > works perfectly, but when I try to build from scratch it never works.
>>> > Any thoughts on this?
>>> > Best,
>>> > AM
>>> >
>>> >
>>> > On Tue, Jan 29, 2019 at 12:42 PM Vermaas, Joshua <
>>> Joshua.Vermaas_at_nrel.gov>
>>> > wrote:
>>> >
>>> >> Hi Aravinda,
>>> >>
>>> >> Any particular reason you want to use the intel compilers? Since your
>>> goal
>>> >> is to use CUDA anyway, and the integration between the CUDA toolkit
>>> and the
>>> >> intel compilers tends to be hit or miss depending on the machine, I'd
>>> try
>>> >> the GNU compilers first (just drop the icc from the build line). If
>>> you can
>>> >> get that working, then you can spend a bit more time debugging
>>> exactly what
>>> >> your error messages mean. It could just be as simple as using
>>> iccstatic
>>> >> instead of icc, so that the libraries are bundled into the executable
>>> at
>>> >> compile time, which would solve your LD_LIBRARY_PATH issues.
>>> >>
>>> >> -Josh
>>> >>
>>> >>
>>> >>
>>> >> On 2019-01-29 09:42:41-07:00 owner-namd-l_at_ks.uiuc.edu wrote:
>>> >>
>>> >> Dear NAMD users and developers,
>>> >> I have recently attempted to compile namd2.13 nightly build to run
>>> >> multiple GPU node replica exchange simulations using REST2
>>> methodology.
>>> >> First, I was able to run the current version of namd 2.13
>>> >> Linux-x86_64-verbs-smp-CUDA (Multi-copy algorithms on InfiniBand)
>>> binaries
>>> >> with charmrun in our university cluster using multiple node/GPU setup
>>> (with
>>> >> slurm).
>>> >> Then, I tried compiling namd 2.13 nightly version to use REST2 (since
>>> the
>>> >> current version have a bug with selecting solute atom IDs as told
>>> here -
>>> >>
>>> https://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2018-2019/1424.html
>>> >> ), with information in NVIDIA site as well as what mentioned in the
>>> >> release note. But I failed my self miserably as several others had (
>>> as I
>>> >> can see from the mailing thread). Since the precompiled binaries
>>> within the
>>> >> current version work perfectly, I cannot think of a reason why my
>>> attempts
>>> >> failed other than some issue related to library files and compilers
>>> I am
>>> >> loading when building charm for multiple node GPU setup. I have used
>>> >> following flags to build the charmm.
>>> >> *./build charm++ verbs-linux-x86_64 icc smp --with-production *
>>> >> I have used ifort and Intel/2018 compilers.
>>> >> One thing I have noticed is that when I use precompiled namd2.13 I
>>> did not
>>> >> have to link LD_LIBRARY_PATH. But I had to do so when I compiled it
>>> my
>>> >> self (otherwise I keep getting missing library files error).
>>> >> It would be a great help if any of you who have successfully compiled
>>> >> multiple node GPU namd 2.13 could share your charmm--6.8.2 files
>>> along with
>>> >> information on compilers you used, so I could compile namd by my
>>> self. Or
>>> >> any sort of advice on how to solve this or sharing namd2.13
>>> precompiled
>>> >> binaries for the nightly version itself is highly appreciated.
>>> >> Thank you,
>>> >> Best,
>>> >> --
>>> >> Aravinda Munasinghe,
>>> >>
>>> >>
>>> >
>>> > --
>>> > Aravinda Munasinghe,
>>> >
>>>
>>
>>
>> --
>> Aravinda Munasinghe,
>>
>>
>
> --
> Aravinda Munasinghe,
>
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:28 CST