Re: Compilation of Charm++ then NAMD 2.6 on Opteron with Portland compilers

From: Philip Peartree (p.peartree_at_postgrad.manchester.ac.uk)
Date: Fri Dec 14 2007 - 07:29:01 CST

I've rerun on a compute node (the previous test was on the cluster head node) and I get what I think is much better results:

test 0: initiated [groupring (milind)]
test 0: completed (0.00 sec)
test 1: initiated [nodering (milind)]
test 1: completed (0.00 sec)
test 2: initiated [varsizetest (mjlang)]
test 2: completed (0.00 sec)
test 3: initiated [varraystest (milind)]
test 3: completed (0.00 sec)
test 4: initiated [groupcast (mjlang)]
test 4: completed (0.00 sec)
test 5: initiated [nodecast (milind)]
test 5: completed (0.00 sec)
test 6: initiated [synctest (mjlang)]
test 6: completed (0.01 sec)
test 7: initiated [fib (jackie)]
test 7: completed (0.00 sec)
test 8: initiated [arrayring (fang)]
test 8: completed (0.01 sec)
test 9: initiated [tempotest (fang)]
test 9: completed (0.00 sec)
test 10: initiated [packtest (fang)]
test 10: completed (0.00 sec)
test 11: initiated [queens (jackie)]
test 11: completed (0.02 sec)
test 12: initiated [migration (jackie)]
test 12: completed (0.00 sec)
test 13: initiated [marshall (olawlor)]
test 13: completed (0.02 sec)
test 14: initiated [priomsg (fang)]
test 14: completed (0.00 sec)
test 15: initiated [priotest (mlind)]
test 15: completed (0.00 sec)
test 16: initiated [rotest (milind)]
test 16: completed (0.00 sec)
test 17: initiated [statistics (olawlor)]
test 17: completed (0.00 sec)
test 18: initiated [templates (milind)]
test 18: completed (0.00 sec)
test 19: initiated [inherit (olawlor)]
test 19: completed (0.00 sec)
test 20: initiated [reduction (olawlor)]
test 20: completed (0.00 sec)
test 21: initiated [callback (olawlor)]
test 21: completed (0.00 sec)
test 22: initiated [immediatering (gengbin)]
test 22: completed (0.03 sec)
test 23: initiated [bitvector (jbooth)]
test 23: completed (0.00 sec)
test 24: initiated [multi groupring (milind)]
test 24: completed (0.02 sec)
test 25: initiated [multi nodering (milind)]
test 25: completed (0.02 sec)
test 26: initiated [multi varsizetest (mjlang)]
test 26: completed (0.00 sec)
test 27: initiated [multi varraystest (milind)]test 27: completed (0.00 sec)
test 28: initiated [multi groupcast (mjlang)]
test 28: completed (0.00 sec)
test 29: initiated [multi nodecast (milind)]
test 29: completed (0.00 sec)
test 30: initiated [multi synctest (mjlang)]
test 30: completed (0.03 sec)
test 31: initiated [multi fib (jackie)]
test 31: completed (0.03 sec)
test 32: initiated [multi arrayring (fang)]
test 32: completed (0.03 sec)
test 33: initiated [multi tempotest (fang)]
test 33: completed (0.00 sec)
test 34: initiated [multi packtest (fang)]
test 34: completed (0.00 sec)
test 35: initiated [multi migration (jackie)]
test 35: completed (0.00 sec)
test 36: initiated [multi marshall (olawlor)]
test 36: completed (0.06 sec)
test 37: initiated [multi priomsg (fang)]
test 37: completed (0.00 sec)
test 38: initiated [multi priotest (mlind)]
test 38: completed (0.00 sec)
test 39: initiated [multi statistics (olawlor)]
test 39: completed (0.00 sec)
test 40: initiated [multi reduction (olawlor)]
test 40: completed (0.00 sec)
test 41: initiated [multi callback (olawlor)]
test 41: completed (0.00 sec)
test 42: initiated [multi immediatering (gengbin)]
test 42: completed (0.13 sec)
test 43: initiated [all-at-once]
test 43: completed (0.08 sec)
All tests completed, exiting
End of program

That was from 2 cpu, 4 cpu looks similar

The speedup is immense, could this indicate some issues somewhere?

Philip Peartree
University of Manchester

----- Original Message -----
From: "Brian Bennion" <bennion1_at_llnl.gov>
To: "Philip Peartree" <p.peartree_at_postgrad.manchester.ac.uk>
Sent: 13 December 2007 00:03:05 o'clock (GMT) Europe/London
Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron with Portland compilers

I can run the comparison to our machines for 1-8cpus for a single node.
The hardware we use and its specs are at this website

computing.llnl.gov
zues

Brian

At 03:47 PM 12/12/2007, you wrote:
>I had tried the pthreads bit, and it didn't make much difference.
>That was with me mistakenly running 4 processes on 2 processors on
>one (head) node... I can try running across multiple nodes if that helps?
>
>Thanks
>
>
>----- Original Message -----
>From: "Brian Bennion" <bennion1_at_llnl.gov>
>To: "Philip Peartree" <p.peartree_at_postgrad.manchester.ac.uk>
>Sent: 12 December 2007 23:01:57 o'clock (GMT) Europe/London
>Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
>with Portland compilers
>
>Yes the threads bit is important to and it was my next suggestion.
>
>The timings are indeed slow. Can you run a 8 cpu megatest and see
>what you get. I have just recently finished a tortuous compile for
>our clusters here, hence the wiki entry.
>I am assuming the megatest data that you just emailed was on 1node with 2cpus?
>
>Brian
>
>At 02:54 PM 12/12/2007, you wrote:
> >Hi Brian,
> >
> >I believe it's a 3Com GigE switch (I realised I'd forgotten to put
> >the specs of my cluster on) I've used your tip from the Wiki which
> >seems to have worked, I think there is a setting in conv-mach.h to
> >change from gnu-malloc to the system malloc, will this help, I don't
> >know if you've seen my other posts but I posted some times from
> >megatest which to me seemed slow, but I don't actually know.
> >
> >Philip Peartree
> >
> >P.S. It is the portland group compiler version 6.0
> >
> >----- Original Message -----
> >From: "Brian Bennion" <bennion1_at_llnl.gov>
> >To: "Philip Peartree" <p.peartree_at_postgrad.manchester.ac.uk>
> >Sent: 12 December 2007 21:08:05 o'clock (GMT) Europe/London
> >Subject: Re: namd-l: Compilation of Charm++ then NAMD 2.6 on Opteron
> >with Portland compilers
> >
> >Which compiler are you using? Is it the portland group compiler?
> >What switches are on your cluster?
> >Basically if you are using an mpi layer then you need to tell charm++
> >to let the system handle memory allocation not gnu-malloc.
> >So if you can give more information on your setup I might be able to
> >fine tune my hints.
> >
> >brian
> >
> >
> >At 07:20 AM 12/12/2007, you wrote:
> > >Hi,
> > >
> > >I'm having some issues compiling Charm++/NAMD on our opteron based
> > >cluster. I've read somewhere that it is best to use the fastest
> > >compiler. I think I've just about got Charm++ working, and it works
> > >fine with the simplearrayhello, and I've compiled the megatest
> > >program.. the only error I got was an error from pgCC saying that
> > >-rdynamic is an unknown switch. If I run on 4 procs I get the
> > following output:
> > >
> > >Megatest is running on 4 processors.
> > >test 0: initiated [groupring (milind)]
> > >test 0: completed (45.86 sec)
> > >test 1: initiated [nodering (milind)]
> > >test 1: completed (45.36 sec)
> > >test 2: initiated [varsizetest (mjlang)]
> > >test 2: completed (0.27 sec)
> > >test 3: initiated [varraystest (milind)]
> > >test 3: completed (0.32 sec)
> > >test 4: initiated [groupcast (mjlang)]
> > >test 4: completed (1.57 sec)
> > >test 5: initiated [nodecast (milind)]
> > >test 5: completed (0.96 sec)
> > >test 6: initiated [synctest (mjlang)]
> > >**ERROR: in routine alloca() there is a
> > >stack overflow: thread 0, max 535822332KB, used 2KB, request -1082359024B
> > >
> > >
> > >Could anyone elaborate on what's wrong here, I heard that the
> > >Megatest program tests things that are fairly crucial to NAMD's running?
> > >
> > >Philip Peartree
> >
> >Biosciences and Biotechnology Division
> >CMELS
> >Lawrence Livermore National Laboratory
> >Phone: 925-422-5722
> >Fax: 925-424-4334
>
>Biosciences and Biotechnology Division
>CMELS
>Lawrence Livermore National Laboratory
>Phone: 925-422-5722
>Fax: 925-424-4334

Biosciences and Biotechnology Division
CMELS
Lawrence Livermore National Laboratory
Phone: 925-422-5722
Fax: 925-424-4334

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:41 CST