Re: Compilation of Charm++ then NAMD 2.6 on Opteron with Portland compilers

From: Dow_Hurst (dhurst_at_mindspring.com)
Date: Sat Dec 15 2007 - 21:49:11 CST

I wanted to point out that many Gig-E switches are not non-blocking.  You want a switch that provides full duplex full bandwidth to all ports simultaneously for good performance.  So, check the bandwidth on the backplane of the switch in your specs.  If the switch is spec'd as "non-blocking" it should be fine.  Low latency on the switch is also important.  The SMC 8624 was a great low cost switch that had the basic requirements for running NAMD well.  A managed switch can be expensive but not necessarily be "non-blocking".  For a basic Gig-E based cluster get two switches and run two networks, one for all services other than MPI, and one for MPI.
FYI,
Dow


-----Original Message-----
From: Philip Peartree
Sent: Dec 14, 2007 5:01 PM
To: Brian Bennion
Cc: namd-l
Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron with Portland compilers

Will do, we have 4 procs per node, with varying speeds (half are 2Ghz the others are 1.8) but I guess that the benchmarks are mainly held back by the interconnect.

I'll let you know when I get results.

Philip


----- Original Message -----
From: "Brian Bennion" <bennion1@llnl.gov>
To: "Philip Peartree" <p.peartree@postgrad.manchester.ac.uk>
Sent: 14 December 2007 18:49:28 o'clock (GMT) Europe/London
Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron with Portland compilers

Hello Philip,

Would you mind terribly much running a few scaling tests.  We are
anticipating upgrading one of our older clusters and are curious to
know if just upgrading the network backbone in that cluster to gigE
would give more bang for our buck.

If you could use the apo1.name bencmark and test by 1 node increments
I would be grateful.

Thank you
Brian

At 10:52 AM 12/14/2007, you wrote:
>I found that it ran ok on 2 cpu's on the head node today, I guess
>there was another process tying the cpu's up. Never the less I'm not
>going to run on it. I've already compiled NAMD up and it seems to be
>ok for a GigE switch. Thanks for all the help
>
>Philip
>
>
>----- Original Message -----
>From: "Brian Bennion" <bennion1@llnl.gov>
>To: "Philip Peartree" <p.peartree@postgrad.manchester.ac.uk>
>Sent: 14 December 2007 17:47:56 o'clock (GMT) Europe/London
>Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
>with Portland compilers
>
>Hello Philip,
>
>As far as I can tell, it just means that running namd on the head
>node will not be a good idea.
>Most of the tests in megatest will not scale well with increasing
>numbers of cpus.  Its best to move on to the namd side of things, get
>it compiled on this workable version of charm++ and do some apo1
>benchmarks to test your gigE switch.
>
>Brian
>
>At 05:29 AM 12/14/2007, you wrote:
> >I've rerun on a compute node (the previous test was on the cluster
> >head node) and I get what I think is much better results:
> >
> >test 0: initiated [groupring (milind)]
> >test 0: completed (0.00 sec)
> >test 1: initiated [nodering (milind)]
> >test 1: completed (0.00 sec)
> >test 2: initiated [varsizetest (mjlang)]
> >test 2: completed (0.00 sec)
> >test 3: initiated [varraystest (milind)]
> >test 3: completed (0.00 sec)
> >test 4: initiated [groupcast (mjlang)]
> >test 4: completed (0.00 sec)
> >test 5: initiated [nodecast (milind)]
> >test 5: completed (0.00 sec)
> >test 6: initiated [synctest (mjlang)]
> >test 6: completed (0.01 sec)
> >test 7: initiated [fib (jackie)]
> >test 7: completed (0.00 sec)
> >test 8: initiated [arrayring (fang)]
> >test 8: completed (0.01 sec)
> >test 9: initiated [tempotest (fang)]
> >test 9: completed (0.00 sec)
> >test 10: initiated [packtest (fang)]
> >test 10: completed (0.00 sec)
> >test 11: initiated [queens (jackie)]
> >test 11: completed (0.02 sec)
> >test 12: initiated [migration (jackie)]
> >test 12: completed (0.00 sec)
> >test 13: initiated [marshall (olawlor)]
> >test 13: completed (0.02 sec)
> >test 14: initiated [priomsg (fang)]
> >test 14: completed (0.00 sec)
> >test 15: initiated [priotest (mlind)]
> >test 15: completed (0.00 sec)
> >test 16: initiated [rotest (milind)]
> >test 16: completed (0.00 sec)
> >test 17: initiated [statistics (olawlor)]
> >test 17: completed (0.00 sec)
> >test 18: initiated [templates (milind)]
> >test 18: completed (0.00 sec)
> >test 19: initiated [inherit (olawlor)]
> >test 19: completed (0.00 sec)
> >test 20: initiated [reduction (olawlor)]
> >test 20: completed (0.00 sec)
> >test 21: initiated [callback (olawlor)]
> >test 21: completed (0.00 sec)
> >test 22: initiated [immediatering (gengbin)]
> >test 22: completed (0.03 sec)
> >test 23: initiated [bitvector (jbooth)]
> >test 23: completed (0.00 sec)
> >test 24: initiated [multi groupring (milind)]
> >test 24: completed (0.02 sec)
> >test 25: initiated [multi nodering (milind)]
> >test 25: completed (0.02 sec)
> >test 26: initiated [multi varsizetest (mjlang)]
> >test 26: completed (0.00 sec)
> >test 27: initiated [multi varraystest (milind)]test 27: completed (0.00 sec)
> >test 28: initiated [multi groupcast (mjlang)]
> >test 28: completed (0.00 sec)
> >test 29: initiated [multi nodecast (milind)]
> >test 29: completed (0.00 sec)
> >test 30: initiated [multi synctest (mjlang)]
> >test 30: completed (0.03 sec)
> >test 31: initiated [multi fib (jackie)]
> >test 31: completed (0.03 sec)
> >test 32: initiated [multi arrayring (fang)]
> >test 32: completed (0.03 sec)
> >test 33: initiated [multi tempotest (fang)]
> >test 33: completed (0.00 sec)
> >test 34: initiated [multi packtest (fang)]
> >test 34: completed (0.00 sec)
> >test 35: initiated [multi migration (jackie)]
> >test 35: completed (0.00 sec)
> >test 36: initiated [multi marshall (olawlor)]
> >test 36: completed (0.06 sec)
> >test 37: initiated [multi priomsg (fang)]
> >test 37: completed (0.00 sec)
> >test 38: initiated [multi priotest (mlind)]
> >test 38: completed (0.00 sec)
> >test 39: initiated [multi statistics (olawlor)]
> >test 39: completed (0.00 sec)
> >test 40: initiated [multi reduction (olawlor)]
> >test 40: completed (0.00 sec)
> >test 41: initiated [multi callback (olawlor)]
> >test 41: completed (0.00 sec)
> >test 42: initiated [multi immediatering (gengbin)]
> >test 42: completed (0.13 sec)
> >test 43: initiated [all-at-once]
> >test 43: completed (0.08 sec)
> >All tests completed, exiting
> >End of program
> >
> >
> >That was from 2 cpu, 4 cpu looks similar
> >
> >The speedup is immense, could this indicate some issues somewhere?
> >
> >Philip Peartree
> >University of Manchester
> >
> >
> >
> >----- Original Message -----
> >From: "Brian Bennion" <bennion1@llnl.gov>
> >To: "Philip Peartree" <p.peartree@postgrad.manchester.ac.uk>
> >Sent: 13 December 2007 00:03:05 o'clock (GMT) Europe/London
> >Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
> >with Portland compilers
> >
> >I can run the comparison to our machines for 1-8cpus for a single node.
> >The hardware we use and its specs are at this website
> >
> >computing.llnl.gov
> >zues
> >
> >Brian
> >
> >At 03:47 PM 12/12/2007, you wrote:
> > >I had tried the pthreads bit, and it didn't make much difference.
> > >That was with me mistakenly running 4 processes on 2 processors on
> > >one (head) node... I can try running across multiple nodes if that helps?
> > >
> > >Thanks
> > >
> > >
> > >----- Original Message -----
> > >From: "Brian Bennion" <bennion1@llnl.gov>
> > >To: "Philip Peartree" <p.peartree@postgrad.manchester.ac.uk>
> > >Sent: 12 December 2007 23:01:57 o'clock (GMT) Europe/London
> > >Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
> > >with Portland compilers
> > >
> > >Yes the threads bit is important to and it was my next suggestion.
> > >
> > >The timings are indeed slow.  Can you run a 8 cpu megatest and see
> > >what you get.  I have just recently finished a tortuous compile for
> > >our clusters here, hence the wiki entry.
> > >I am assuming the megatest data that you just emailed was on 1node
> > with 2cpus?
> > >
> > >Brian
> > >
> > >At 02:54 PM 12/12/2007, you wrote:
> > > >Hi Brian,
> > > >
> > > >I believe it's a 3Com GigE switch (I realised I'd forgotten to put
> > > >the specs of my cluster on) I've used your tip from the Wiki which
> > > >seems to have worked, I think there is a setting in conv-mach.h to
> > > >change from gnu-malloc to the system malloc, will this help, I don't
> > > >know if you've seen my other posts but I posted some times from
> > > >megatest which to me seemed slow, but I don't actually know.
> > > >
> > > >Philip Peartree
> > > >
> > > >P.S. It is the portland group compiler version 6.0
> > > >
> > > >----- Original Message -----
> > > >From: "Brian Bennion" <bennion1@llnl.gov>
> > > >To: "Philip Peartree" <p.peartree@postgrad.manchester.ac.uk>
> > > >Sent: 12 December 2007 21:08:05 o'clock (GMT) Europe/London
> > > >Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
> > > >with Portland compilers
> > > >
> > > >Which compiler are you using?  Is it the portland group compiler?
> > > >What switches are on your cluster?
> > > >Basically if you are using an mpi layer then you need to tell charm++
> > > >to let the system handle memory allocation not gnu-malloc.
> > > >So if you can give more information on your setup I might be able to
> > > >fine tune my hints.
> > > >
> > > >brian
> > > >
> > > >
> > > >At 07:20 AM 12/12/2007, you wrote:
> > > > >Hi,
> > > > >
> > > > >I'm having some issues compiling Charm++/NAMD on our opteron based
> > > > >cluster. I've read somewhere that it is best to use the fastest
> > > > >compiler. I think I've just about got Charm++ working, and it works
> > > > >fine with the simplearrayhello, and I've compiled the megatest
> > > > >program.. the only error I got was an error from pgCC saying that
> > > > >-rdynamic is an unknown switch. If I run on 4 procs I get the
> > > > following output:
> > > > >
> > > > >Megatest is running on 4 processors.
> > > > >test 0: initiated [groupring (milind)]
> > > > >test 0: completed (45.86 sec)
> > > > >test 1: initiated [nodering (milind)]
> > > > >test 1: completed (45.36 sec)
> > > > >test 2: initiated [varsizetest (mjlang)]
> > > > >test 2: completed (0.27 sec)
> > > > >test 3: initiated [varraystest (milind)]
> > > > >test 3: completed (0.32 sec)
> > > > >test 4: initiated [groupcast (mjlang)]
> > > > >test 4: completed (1.57 sec)
> > > > >test 5: initiated [nodecast (milind)]
> > > > >test 5: completed (0.96 sec)
> > > > >test 6: initiated [synctest (mjlang)]
> > > > >**ERROR: in routine alloca() there is a
> > > > >stack overflow: thread 0, max 535822332KB, used 2KB, request
> > -1082359024B
> > > > >
> > > > >
> > > > >Could anyone elaborate on what's wrong here, I heard that the
> > > > >Megatest program tests things that are fairly crucial to
> NAMD's running?
> > > > >
> > > > >Philip Peartree
> > > >
> > > >Biosciences and Biotechnology Division
> > > >CMELS
> > > >Lawrence Livermore National Laboratory
> > > >Phone: 925-422-5722
> > > >Fax: 925-424-4334
> > >
> > >Biosciences and Biotechnology Division
> > >CMELS
> > >Lawrence Livermore National Laboratory
> > >Phone: 925-422-5722
> > >Fax: 925-424-4334
> >
> >Biosciences and Biotechnology Division
> >CMELS
> >Lawrence Livermore National Laboratory
> >Phone: 925-422-5722
> >Fax: 925-424-4334
>
>Biosciences and Biotechnology Division
>CMELS
>Lawrence Livermore National Laboratory
>Phone: 925-422-5722
>Fax: 925-424-4334

Biosciences and Biotechnology Division
CMELS
Lawrence Livermore National Laboratory
Phone: 925-422-5722
Fax: 925-424-4334

No sig.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:42 CST