Re: Illegal instruction signal at startup. (with net-rs6k smp)

From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Fri Apr 09 2004 - 15:14:55 CDT

Hi,

Sorry to butt in, but doesn't the +p2 argument require charmrun to be
loading namd2?

ie

charmrun ++local namd2 +p2 alanin.namd

Brian

On Fri, 9 Apr 2004, Hansang Bae wrote:

> I tried your fix but it didn't work.
> Actually, I narrowed down the place where it crashes.
>
> I'm running namd with command line:
> namd2 +p2 alanin.namd
>
> The error occurs at the second thread when it tries to execute
> (h->hdlr)(msg,h->userPtr); (line 938 in convcore.c)
>
> ,where both h->hdlr and h->userPtr are null. (h->hdlr is crucial I think)
>
> Do you have any idea?
>
> Thanks,
> Hansang Bae
>
> On Thu, 8 Apr 2004, Gengbin Zheng wrote:
>
> >
> > Hi Hansang,
> >
> > It seems that there is some problem with the new buildin gnu malloc of
> > Charm++. Please try if this could fix it:
> >
> > edit charm/net-rs6k-smp/tmp/conv-mach-smp.h, add this:
> >
> > #undef CMK_MALLOC_USE_GNU_MALLOC
> > #undef CMK_MALLOC_USE_OS_BUILTIN
> > #define CMK_MALLOC_USE_OS_BUILTIN 1
> >
> > Do a clean make (make clean, and make charm++ OPTS=-g)
> > and re-link namd2.
> >
> > Please let me know if this works or not,
> >
> > Gengbin
> >
> > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> >
> > >
> > > I see. Could you send me your command line options to get this crash?
> > > I supposed this is alanin.
> > >
> > > Gengbin
> > >
> > >
> > > On Thu, 8 Apr 2004, Hansang Bae wrote:
> > >
> > > > Of course, I compiled this version with -g option, and Other versions,
> > > > net-rs6k and mpi-sp do not have any problem. I'm using tcl-8.4.4 and
> > > > fftw-2.1.5.
> > > >
> > > > Thanks,
> > > > Hansang Bae
> > > > 1285 EE Building, Mail Box #58
> > > > West Lafayette, IN 47907-1285
> > > > (H) 765-496-4729
> > > > (L) 765-494-3550 (EE 347)
> > > >
> > > > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> > > >
> > > > >
> > > > >
> > > > > It is a little hard to find out anything wrong here. I would suggest build
> > > > > your own binary (there may be binary or library incompatibility problem).
> > > > > For more options, you can try net-rs6k (without smp) or MPI version
> > > > > like mpi-sp|IBM-SP.
> > > > >
> > > > > Gengbin
> > > > >
> > > > > On Tue, 6 Apr 2004, Hansang Bae wrote:
> > > > >
> > > > > > I have a problem running the AIX-RS6000-SMP version with multiple threads.
> > > > > > It crashes generating illegal instruction exception at startup phase.
> > > > > > Strange thing is sometimes this doesn't happen.
> > > > > >
> > > > > > Here is "some" information from dbx log.
> > > > > >
> > > > > > ...
> > > > > > Info: ****************************
> > > > > > Info: STRUCTURE SUMMARY:
> > > > > > Info: 66 ATOMS
> > > > > > Info: 65 BONDS
> > > > > > Info: 96 ANGLES
> > > > > > Info: 31 DIHEDRALS
> > > > > > Info: 32 IMPROPERS
> > > > > > Info: 0 EXCLUSIONS
> > > > > > Info: 195 DEGREES OF FREEDOM
> > > > > > Info: 55 HYDROGEN GROUPS
> > > > > > Info: TOTAL MASS = 783.886 amu
> > > > > > Info: TOTAL CHARGE = 8.19564e-08 e
> > > > > > Info: *****************************
> > > > > > [20] stopped in suspend() at line 153 in file "BackEnd.cc" ($t1)
> > > > > > 153 CsdScheduler(-1);
> > > > > > (dbx) s
> > > > > > Info: Entering startup phase 0 with 3804 kB of memory in use.
> > > > > > Info: Entering startup phase 1 with 3804 kB of memory in use.
> > > > > >
> > > > > > Illegal instruction in . at 0x0 ($t2)
> > > > > > 0x00000000 00000000 Invalid opcode.
> > > > > > (dbx) where
> > > > > > warning: could not locate trace table from starting address 0x0
> > > > > > CmiHandleMessage(0x305d0a08) at 0x10011b38
> > > > > > CsdScheduleForever() at 0x10012be4
> > > > > > CsdScheduler(0xffffffff) at 0x10012d0c
> > > > > > slave_init(int,char**)(argc = 3, argv = 0x3027b6d8), line 94 in
> > > > > > "BackEnd.cc"
> > > > > > ConverseRunPE(0x0) at 0x1000c96c
> > > > > > call_startfn(0x1) at 0x1000b810
> > > > > > _pthread_body(??) at 0xd004b3fc
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Hansang Bae
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
>

*****************************************************************
**Brian Bennion, Ph.D. **
**Computational and Systems Biology Division **
**Biology and Biotechnology Research Program **
**Lawrence Livermore National Laboratory **
**P.O. Box 808, L-448 bennion1_at_llnl.gov **
**7000 East Avenue phone: (925) 422-5722 **
**Livermore, CA 94550 fax: (925) 424-6605 **
*****************************************************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:18:10 CST