Re: Not getting good speed up

From: Mauricio Carrillo Tripp (trippm_at_gmail.com)
Date: Sat Mar 12 2005 - 15:38:06 CST

I used the same set of cables with both switches (a new set).
The reason I can't use a Dell switch with the new cluster
is that Dell doesn't have one switch with more than 24 ports (to the best of
my knowledge). Besides, I already bought the Cisco switch and at least I
want to make sure there is nothing I'm missing on trying to make it work
properly before I trow it on the garbage :)

On Fri, 11 Mar 2005 15:27:56 -0800, Michael Grabe <mgrabe_at_itsa.ucsf.edu> wrote:
> Mauricio,
>
> If you know the problem is the switch, why don't you
> just use the one that gives you good performance?
>
> why the additional question:
>
>
> > Any other thoughts or suggestions??
>
> -Michael
>
> > On Thu, 3 Mar 2005 14:44:55 -0500, Mauricio Carrillo Tripp
> > <trippm_at_gmail.com> wrote:
> >> sorry about that stupid mistake, it's building now...
> >>
> >> I tried the option +giga (and also +strategy USE_MESH and +strategy
> >> USE_GRID)
> >> All of them improved a little the scaling, but not enough though.
> >> It is true that the new cluster is using Rocks, and that's why I want
> >> to compare the
> >> behaviour off all different versions of charm++...
> >> I'll keep you all posted on my findings...
> >> Thanks.
> >>
> >>
> >> On Thu, 03 Mar 2005 13:34:56 -0600, Gengbin Zheng
> >> <gzheng_at_ks.uiuc.edu> wrote:
> >>>
> >>> For UDP namd, use command line option: "+giga" which turned on some
> >>> setting (such as UDP window protocol settings) which could speedup
> >>> the
> >>> performance. You can give it a try.
> >>>
> >>> I noticed that the OS of your new cluster is Rock. I am not familar
> >>> with
> >>> it, but I assume it has something different in the way it launch a
> >>> parallel job.
> >>>
> >>> The error I have when building charm:
> >>>
> >>> make: *** No rule to make target `charmm++'. Stop.
> >>> -------------------------------------------------
> >>> Charm++ NOT BUILT. Either cd into mpi-linux-icc/tmp and try
> >>>
> >>> this is because you misspelled charm++ to charmm++.
> >>>
> >>> Gengbin
> >>>
> >>> Mauricio Carrillo Tripp wrote:
> >>>
> >>>> Hi Gengbin, thanks for your answer. I did the comparison you
> >>>> recommend,
> >>>> TCP vs UDP (I didn't compile from source though, I used the
> >>>> executables NAMD supplies). The results are on Fig 2 at
> >>>> http://chem.acad.wabash.edu/~trippm/Clusters/performance.php.
> >>>> Indeed, I get an increase in performance but not good enough.
> >>>> Using the TCP version on the old cluster (lg66) did show good
> >>>> scaling,
> >>>> but that's not the case for the new cluster (lgrocks).
> >>>> Any ideas why is this, anybody?
> >>>>
> >>>> I'm trying to compile different versions of charm++ (gcc, intel,
> >>>> tcp,
> >>>> udp, mpi), to compare them using converse/pingpong,
> >>>> although I'm having trouble building the mpi version,
> >>>> I haven't found an example on how to do it, and all I get is:
> >>>>
> >>>>
> >>>>
> >>>>> ./build charmm++ mpi-linux icc --libdir="/opt/lam/intel/lib"
> >>>>>
> >>>>>
> >>>> --incdir="/opt/lam/intel/include"
> >>>> Selected Compiler: icc
> >>>> Selected Options:
> >>>> Copying src/scripts/Makefile to mpi-linux-icc/tmp
> >>>> Soft-linking over bin
> >>>> Soft-linking over lib
> >>>> Soft-linking over lib_so
> >>>> Soft-linking over include
> >>>> Soft-linking over tmp
> >>>> Generating mpi-linux-icc/tmp/conv-mach-pre.sh
> >>>> Performing 'make charmm++ OPTS=' in mpi-linux-icc/tmp
> >>>> make: *** No rule to make target `charmm++'. Stop.
> >>>> -------------------------------------------------
> >>>> Charm++ NOT BUILT. Either cd into mpi-linux-icc/tmp and try
> >>>>
> >>>> any help will be appreciated!
> >>>>
> >>>> Thanks again.
> >>>>
> >>>>
> >>>> On Wed, 02 Mar 2005 21:42:39 -0600, Gengbin Zheng
> >>>> <gzheng_at_ks.uiuc.edu> wrote:
> >>>>
> >>>>
> >>>>> Hi Mauricio,
> >>>>>
> >>>>> With NAMD-tcp version, Charm deoes not compiled on top of MPI, the
> >>>>> communication is based on native TCP socket, that is Charm++ itself
> >>>>> implements its message passing function using TCP sockets.
> >>>>> I can not provide a reason to explain why the scaling is so bad,
> >>>>> because
> >>>>> I don't think it should behave like that.
> >>>>> You can do some test running Charm pingpong tests (available at
> >>>>> charm/pgms/converse/pingpong), and see what's the pingpong one way
> >>>>> latency is to compare with MPI.
> >>>>>
> >>>>> In fact, I recommend you compile a UDP socket version of charm and
> >>>>> NAMD
> >>>>> from source as comparison. (it is net-linux version of charm, and
> >>>>> Linux-i686 version of NAMD).
> >>>>> We have seen NAMD running with good scaling with gigabit ethernet.
> >>>>>
> >>>>> Gengbin
> >>>>>
> >>>>> Mauricio Carrillo Tripp wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Some time ago I started using NAMD on a 16 node cluster
> >>>>>> with good results. I downloaded the executables (tcp version,
> >>>>>> the recomended one for gigabit network) and everything
> >>>>>> ran smoothly. The speed up was good (see Fig 1 at
> >>>>>> http://chem.acad.wabash.edu/~trippm/Clusters/performance.php),
> >>>>>> although maybe it could be improved, which takes me two the real
> >>>>>> issue: we got a new cluster, I did the same as before, but
> >>>>>> I noticed that the simulations were running a lot slower.
> >>>>>> I did the same analysis as I did with the old cluster
> >>>>>> and I found that the speed up was just terrible. I tried the other
> >>>>>> executable version of NAMD2.5 and things went a little better
> >>>>>> but not quite as good as I know the could've been (see Fig 2 at
> >>>>>> http://chem.acad.wabash.edu/~trippm/Clusters/performance.php).
> >>>>>> I also found a big difference between MPICH and LAM/MPI. The
> >>>>>> latter is the only MPI library installed in the old cluster.
> >>>>>> So, these results clearly show that the problem lays in the
> >>>>>> communication
> >>>>>> (cpu speed up is good)
> >>>>>> and they suggest that charm++ is behaving as MPICH (or worst),
> >>>>>> but I don't know the details of how charm++ works, i.e., does it
> >>>>>> rely on the MPI libraries? if so, how can I tell it which one to
> >>>>>> use?
> >>>>>> If not, how can I optimize its performance? (Is there a way to
> >>>>>> measure
> >>>>>> it in a similar way as NetPIPE does?). I would like to take the
> >>>>>> maximum
> >>>>>> advantage when running on 32 processors...
> >>>>>>
> >>>>>> Any advice will be appreciated. Thanks.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Mauricio Carrillo Tripp, PhD
> >> Department of Chemistry
> >> Wabash College
> >> trippm_at_wabash.edu
> >>
> >
> >
> > --
> >
> > Mauricio Carrillo Tripp, PhD
> > Department of Chemistry
> > Wabash College
> > trippm_at_wabash.edu
> > http://chem.acad.wabash.edu/~trippm
> >
> >
> ------------------------------------------------------------------------
> --------------------------
> Michael Grabe, Ph.D.
> HHMI/UCSF
> Genetics Development & Behavioral Science Building
> 1550 4th Street, GD 481
> San Francisco, CA 94143-0725
> tel: ++ 415.476.0421
> http://itsa.ucsf.edu/~mgrabe
>
>

-- 
Mauricio Carrillo Tripp, PhD
Department of Chemistry
Wabash College
trippm_at_wabash.edu
http://chem.acad.wabash.edu/~trippm

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:15 CST