Re: namd ibverbs

From: Kevin C Chan (cchan2242-c_at_my.cityu.edu.hk)
Date: Thu Jul 23 2015 - 05:38:49 CDT

Dear Users,

I am also struggling with namd 2.10 ibverbs. I used to utilize the
pre-compiled ibverbs CUDA version and was successful to execute using a
default ssh remote-shell. Here's my commands:

$BINDIR/charmrun +p${NPROCS} ++nodelist $nodelist $BINDIR/namd2 config.namd

where $BINDIR points to the binary folder downloaded directly from the
website and $nodelist looks like this for a 2-node job:
group main ++cpus 16 +shell ssh
host compute-0-8
host compute-0-9

As a result, charmrun splits namd onto the 2 nodes and each as a >1500% GPU
usage process.

However such execution gives relatively unsatisfactory benchmarks, ~1/2
ns/day that on a Cray machine. I have searched around the mailing list and
tried a number of ++mpiexec options but it gives me weired outcomes:
1) If my mpiexec_wrapper is like "exec mpiexec -machinefile $PBS_NODEFILE
$*", it splits namd into 16 threads each with 100% CPU usage (check with
top command) and in this case 2 nodes could give NO speed-up at all
2) If my mpiexec_wrapper is like " shift; shift; exec ibrun $*", it does
not even work as I do not have any ibrun command on my cluster.

I have always had question about my OFED library that whether successful
usage of a pre-compiled ibverbs CUDA NAMD means that we have installed
correctly the library? How can I know that it is really utilizing the
infiniBand but not the LAN?

Thanks in advance for any experience shared.

Regards,
Kevin
City University of Hong Kong

On Fri, Jul 17, 2015 at 6:09 AM, Gianluca Interlandi <
gianluca_at_u.washington.edu> wrote:

> Another tip. I found that compiling the NAMD code locally on your cluster,
> instead of using the downloaded binaries, could get you a boost of 5-10%. I
> added the flag "-xhost" to the compiler in both, charm++ and NAMD
> compilation. This flag optimizes the intel compiler (gcc has similar
> options) for the local architecture. But that involves carefully reading
> notes.txt and finding out how to add the "-xhost" flag.
>
>
> Gianluca
>
> On Thu, 16 Jul 2015, Maxime Boissonneault wrote:
>
> Thanks. I would never have found this file.
>>
>> I searched for things like README, or USAGE, but not "notes.txt", which
>> looks more like an internal document meant for the developper team itself ;)
>>
>> Maxime
>>
>>
>> Le 2015-07-16 17:20, Gianluca Interlandi a écrit :
>>
>>> Also, apropos documentation. In the NAMD folder that you downloaded
>>> there is a file called "notes.txt". That file contains information how to
>>> run NAMD (and also compile it in case you download the source code). There,
>>> they have a section on "Linux Clusters with InfiniBand".
>>>
>>> I copy it here:
>>>
>>> -- Linux Clusters with InfiniBand or Other High-Performance Networks --
>>>
>>> Charm++ provides a special ibverbs network layer that uses InfiniBand
>>> networks directly through the OpenFabrics OFED ibverbs library. This
>>> avoids efficiency and portability issues associated with MPI. Look for
>>> pre-built ibverbs NAMD binaries or specify ibverbs when building Charm++.
>>>
>>> Writing batch job scripts to run charmrun in a queueing system can be
>>> challenging. Since most clusters provide directions for using mpiexec
>>> to launch MPI jobs, charmrun provides a ++mpiexec option to use mpiexec
>>> to launch non-MPI binaries. If "mpiexec -np <procs> ..." is not
>>> sufficient to launch jobs on your cluster you will need to write an
>>> executable mympiexec script like the following from TACC:
>>>
>>> #!/bin/csh
>>> shift; shift; exec ibrun $*
>>>
>>> The job is then launched (with full paths where needed) as:
>>>
>>> charmrun +p<procs> ++mpiexec ++remote-shell mympiexec namd2
>>> <configfile>
>>>
>>>
>>> Gianluca
>>>
>>> On Thu, 16 Jul 2015, Gianluca Interlandi wrote:
>>>
>>> You might want to try different ways. You can try with
>>>>
>>>> ibrun $BINDIR/namd2 config.namd
>>>>
>>>> and see whether that is faster or slower than the method I first
>>>> indicated. I would first try on a single node.
>>>>
>>>> Gianluca
>>>>
>>>> On Thu, 16 Jul 2015, Maxime Boissonneault wrote:
>>>>
>>>> Hi,
>>>>> Yes, the goal is to see how that user can scale its NAMD computation
>>>>> on multiple GPU nodes. He already gets a good scaling on a single node.
>>>>> Each of our nodes have 8 x K20s.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Maxime
>>>>>
>>>>> Le 2015-07-16 16:56, Gianluca Interlandi a écrit :
>>>>>
>>>>>> Exactly, mpiexec is just a launcher. You will be using ibverbs and
>>>>>> not MPI. This is at least my understanding. I tried out different ways of
>>>>>> running NAMD with ibverbs and found that the method from Jim Philipps (one
>>>>>> of the NAMD developers) is fastest.
>>>>>>
>>>>>> Are you running on multiple nodes?
>>>>>>
>>>>>> Gianluca
>>>>>>
>>>>>> On Thu, 16 Jul 2015, Maxime Boissonneault wrote:
>>>>>>
>>>>>> Hi Gianluca,
>>>>>>>
>>>>>>> So, the NAMD ibverbs package requires MPI ? I thought the whole
>>>>>>> point of doing an ibverbs was to get rid of MPI and run directly through
>>>>>>> the Infiniband verbs.
>>>>>>>
>>>>>>> Or is mpiexec only used as a launcher (and could be done through
>>>>>>> pdsh or something like that ?)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Maxime
>>>>>>>
>>>>>>> Le 2015-07-16 16:31, Gianluca Interlandi a écrit :
>>>>>>>
>>>>>>>> Hi Maxime,
>>>>>>>>
>>>>>>>> I usually use the option "++mpiexec" to run NAMD with ibverbs (the
>>>>>>>> following is a single line):
>>>>>>>>
>>>>>>>> $BINDIR/charmrun +p${NPROCS} ++mpiexec ++remote-shell
>>>>>>>> mpiexec_wrapper
>>>>>>>> $BINDIR/namd2 config.namd
>>>>>>>>
>>>>>>>> $BINDIR is the path to your NAMD executable.
>>>>>>>>
>>>>>>>> "mpiexec_wrapper" is a script that I got a while ago from Jim
>>>>>>>> Philipps. You can copy the following lines to a file called
>>>>>>>> "mpiexec_wrapper" and make it executable:
>>>>>>>>
>>>>>>>> #!/bin/csh
>>>>>>>>
>>>>>>>> exec mpiexec -machinefile $PBS_NODEFILE $*
>>>>>>>>
>>>>>>>>
>>>>>>>> If you run on GPU as well, you should be able to simply add to the
>>>>>>>> line above:
>>>>>>>>
>>>>>>>> +devices 0,1,2 +idlepoll
>>>>>>>>
>>>>>>>> However, I need to say that if you are running on a single node and
>>>>>>>> you do not need to use the network, the multicore version might be slightly
>>>>>>>> faster than ibverbs. This is because you will not have the overhead of the
>>>>>>>> network libraries. I usually use ibverbs only if running on more than one
>>>>>>>> node.
>>>>>>>>
>>>>>>>> Gianluca
>>>>>>>>
>>>>>>>> On Thu, 16 Jul 2015, Maxime Boissonneault wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>> One of our user would like to use namd 2.10 ibverbs with GPU on
>>>>>>>>> our GPU cluster. I cannot find documentation on how to use it. Is there
>>>>>>>>> such documentation ? We're used to either use the multicore-cuda version or
>>>>>>>>> the Cuda+MPI version, but since you now provide an ibverbs+GPU binaries, we
>>>>>>>>> downloaded them and wanted to try them.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ---------------------------------
>>>>>>>>> Maxime Boissonneault
>>>>>>>>> Analyste de calcul - Calcul Québec, Université Laval
>>>>>>>>> Président - Comité de coordination du soutien à la recherche de
>>>>>>>>> Calcul Québec
>>>>>>>>> Ph. D. en physique
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> -----------------------------------------------------
>>>>>>>> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>>>>>>>> +1 (206) 685 4435
>>>>>>>> http://artemide.bioeng.washington.edu/
>>>>>>>>
>>>>>>>> Research Assistant Professor at the Department of Bioengineering
>>>>>>>> at the University of Washington, Seattle WA U.S.A.
>>>>>>>> -----------------------------------------------------
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---------------------------------
>>>>>>> Maxime Boissonneault
>>>>>>> Analyste de calcul - Calcul Québec, Université Laval
>>>>>>> Président - Comité de coordination du soutien à la recherche de
>>>>>>> Calcul Québec
>>>>>>> Ph. D. en physique
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -----------------------------------------------------
>>>>>> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>>>>>> +1 (206) 685 4435
>>>>>> http://artemide.bioeng.washington.edu/
>>>>>>
>>>>>> Research Assistant Professor at the Department of Bioengineering
>>>>>> at the University of Washington, Seattle WA U.S.A.
>>>>>> -----------------------------------------------------
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---------------------------------
>>>>> Maxime Boissonneault
>>>>> Analyste de calcul - Calcul Québec, Université Laval
>>>>> Président - Comité de coordination du soutien à la recherche de Calcul
>>>>> Québec
>>>>> Ph. D. en physique
>>>>>
>>>>>
>>>>>
>>>> -----------------------------------------------------
>>>> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>>>> +1 (206) 685 4435
>>>> http://artemide.bioeng.washington.edu/
>>>>
>>>> Research Assistant Professor at the Department of Bioengineering
>>>> at the University of Washington, Seattle WA U.S.A.
>>>> -----------------------------------------------------
>>>>
>>>
>>> -----------------------------------------------------
>>> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>>> +1 (206) 685 4435
>>> http://artemide.bioeng.washington.edu/
>>>
>>> Research Assistant Professor at the Department of Bioengineering
>>> at the University of Washington, Seattle WA U.S.A.
>>> -----------------------------------------------------
>>>
>>
>>
>> --
>> ---------------------------------
>> Maxime Boissonneault
>> Analyste de calcul - Calcul Québec, Université Laval
>> Président - Comité de coordination du soutien à la recherche de Calcul
>> Québec
>> Ph. D. en physique
>>
>>
>>
> -----------------------------------------------------
> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
> +1 (206) 685 4435
> http://artemide.bioeng.washington.edu/
>
> Research Assistant Professor at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> -----------------------------------------------------
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:59 CST