Re: Illegal instruction signal at startup. (with net-rs6k smp)

From: Hansang Bae (baeh_at_ecn.purdue.edu)
Date: Fri Apr 09 2004 - 16:20:20 CDT

I see. I think I didn't notice standalone mode without charmrun only works
for Solaris and Windows because I have been running it on a Sun
workstation so far.

Thank you very much.

Thanks,
Hansang Bae

On Fri, 9 Apr 2004, Brian Bennion wrote:

> Hi,
>
> Sorry to butt in, but doesn't the +p2 argument require charmrun to be
> loading namd2?
>
> ie
>
> charmrun ++local namd2 +p2 alanin.namd
>
>
> Brian
>
>
> On Fri, 9 Apr 2004, Hansang Bae wrote:
>
> > I tried your fix but it didn't work.
> > Actually, I narrowed down the place where it crashes.
> >
> > I'm running namd with command line:
> > namd2 +p2 alanin.namd
> >
> > The error occurs at the second thread when it tries to execute
> > (h->hdlr)(msg,h->userPtr); (line 938 in convcore.c)
> >
> > ,where both h->hdlr and h->userPtr are null. (h->hdlr is crucial I think)
> >
> > Do you have any idea?
> >
> > Thanks,
> > Hansang Bae
> >
> > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> >
> > >
> > > Hi Hansang,
> > >
> > > It seems that there is some problem with the new buildin gnu malloc of
> > > Charm++. Please try if this could fix it:
> > >
> > > edit charm/net-rs6k-smp/tmp/conv-mach-smp.h, add this:
> > >
> > > #undef CMK_MALLOC_USE_GNU_MALLOC
> > > #undef CMK_MALLOC_USE_OS_BUILTIN
> > > #define CMK_MALLOC_USE_OS_BUILTIN 1
> > >
> > > Do a clean make (make clean, and make charm++ OPTS=-g)
> > > and re-link namd2.
> > >
> > > Please let me know if this works or not,
> > >
> > > Gengbin
> > >
> > > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> > >
> > > >
> > > > I see. Could you send me your command line options to get this crash?
> > > > I supposed this is alanin.
> > > >
> > > > Gengbin
> > > >
> > > >
> > > > On Thu, 8 Apr 2004, Hansang Bae wrote:
> > > >
> > > > > Of course, I compiled this version with -g option, and Other versions,
> > > > > net-rs6k and mpi-sp do not have any problem. I'm using tcl-8.4.4 and
> > > > > fftw-2.1.5.
> > > > >
> > > > > Thanks,
> > > > > Hansang Bae
> > > > > 1285 EE Building, Mail Box #58
> > > > > West Lafayette, IN 47907-1285
> > > > > (H) 765-496-4729
> > > > > (L) 765-494-3550 (EE 347)
> > > > >
> > > > > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > It is a little hard to find out anything wrong here. I would suggest build
> > > > > > your own binary (there may be binary or library incompatibility problem).
> > > > > > For more options, you can try net-rs6k (without smp) or MPI version
> > > > > > like mpi-sp|IBM-SP.
> > > > > >
> > > > > > Gengbin
> > > > > >
> > > > > > On Tue, 6 Apr 2004, Hansang Bae wrote:
> > > > > >
> > > > > > > I have a problem running the AIX-RS6000-SMP version with multiple threads.
> > > > > > > It crashes generating illegal instruction exception at startup phase.
> > > > > > > Strange thing is sometimes this doesn't happen.
> > > > > > >
> > > > > > > Here is "some" information from dbx log.
> > > > > > >
> > > > > > > ...
> > > > > > > Info: ****************************
> > > > > > > Info: STRUCTURE SUMMARY:
> > > > > > > Info: 66 ATOMS
> > > > > > > Info: 65 BONDS
> > > > > > > Info: 96 ANGLES
> > > > > > > Info: 31 DIHEDRALS
> > > > > > > Info: 32 IMPROPERS
> > > > > > > Info: 0 EXCLUSIONS
> > > > > > > Info: 195 DEGREES OF FREEDOM
> > > > > > > Info: 55 HYDROGEN GROUPS
> > > > > > > Info: TOTAL MASS = 783.886 amu
> > > > > > > Info: TOTAL CHARGE = 8.19564e-08 e
> > > > > > > Info: *****************************
> > > > > > > [20] stopped in suspend() at line 153 in file "BackEnd.cc" ($t1)
> > > > > > > 153 CsdScheduler(-1);
> > > > > > > (dbx) s
> > > > > > > Info: Entering startup phase 0 with 3804 kB of memory in use.
> > > > > > > Info: Entering startup phase 1 with 3804 kB of memory in use.
> > > > > > >
> > > > > > > Illegal instruction in . at 0x0 ($t2)
> > > > > > > 0x00000000 00000000 Invalid opcode.
> > > > > > > (dbx) where
> > > > > > > warning: could not locate trace table from starting address 0x0
> > > > > > > CmiHandleMessage(0x305d0a08) at 0x10011b38
> > > > > > > CsdScheduleForever() at 0x10012be4
> > > > > > > CsdScheduler(0xffffffff) at 0x10012d0c
> > > > > > > slave_init(int,char**)(argc = 3, argv = 0x3027b6d8), line 94 in
> > > > > > > "BackEnd.cc"
> > > > > > > ConverseRunPE(0x0) at 0x1000c96c
> > > > > > > call_startfn(0x1) at 0x1000b810
> > > > > > > _pthread_body(??) at 0xd004b3fc
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Hansang Bae
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
> *****************************************************************
> **Brian Bennion, Ph.D. **
> **Computational and Systems Biology Division **
> **Biology and Biotechnology Research Program **
> **Lawrence Livermore National Laboratory **
> **P.O. Box 808, L-448 bennion1_at_llnl.gov **
> **7000 East Avenue phone: (925) 422-5722 **
> **Livermore, CA 94550 fax: (925) 424-6605 **
> *****************************************************************
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:37:31 CST