Re: Illegal instruction signal at startup. (with net-rs6k smp)

From: Hansang Bae (baeh_at_ecn.purdue.edu)
Date: Fri Apr 09 2004 - 14:21:52 CDT

I tried your fix but it didn't work.
Actually, I narrowed down the place where it crashes.

I'm running namd with command line:
  namd2 +p2 alanin.namd

The error occurs at the second thread when it tries to execute
  (h->hdlr)(msg,h->userPtr); (line 938 in convcore.c)

,where both h->hdlr and h->userPtr are null. (h->hdlr is crucial I think)

Do you have any idea?

Thanks,
Hansang Bae

On Thu, 8 Apr 2004, Gengbin Zheng wrote:

>
> Hi Hansang,
>
> It seems that there is some problem with the new buildin gnu malloc of
> Charm++. Please try if this could fix it:
>
> edit charm/net-rs6k-smp/tmp/conv-mach-smp.h, add this:
>
> #undef CMK_MALLOC_USE_GNU_MALLOC
> #undef CMK_MALLOC_USE_OS_BUILTIN
> #define CMK_MALLOC_USE_OS_BUILTIN 1
>
> Do a clean make (make clean, and make charm++ OPTS=-g)
> and re-link namd2.
>
> Please let me know if this works or not,
>
> Gengbin
>
> On Thu, 8 Apr 2004, Gengbin Zheng wrote:
>
> >
> > I see. Could you send me your command line options to get this crash?
> > I supposed this is alanin.
> >
> > Gengbin
> >
> >
> > On Thu, 8 Apr 2004, Hansang Bae wrote:
> >
> > > Of course, I compiled this version with -g option, and Other versions,
> > > net-rs6k and mpi-sp do not have any problem. I'm using tcl-8.4.4 and
> > > fftw-2.1.5.
> > >
> > > Thanks,
> > > Hansang Bae
> > > 1285 EE Building, Mail Box #58
> > > West Lafayette, IN 47907-1285
> > > (H) 765-496-4729
> > > (L) 765-494-3550 (EE 347)
> > >
> > > On Thu, 8 Apr 2004, Gengbin Zheng wrote:
> > >
> > > >
> > > >
> > > > It is a little hard to find out anything wrong here. I would suggest build
> > > > your own binary (there may be binary or library incompatibility problem).
> > > > For more options, you can try net-rs6k (without smp) or MPI version
> > > > like mpi-sp|IBM-SP.
> > > >
> > > > Gengbin
> > > >
> > > > On Tue, 6 Apr 2004, Hansang Bae wrote:
> > > >
> > > > > I have a problem running the AIX-RS6000-SMP version with multiple threads.
> > > > > It crashes generating illegal instruction exception at startup phase.
> > > > > Strange thing is sometimes this doesn't happen.
> > > > >
> > > > > Here is "some" information from dbx log.
> > > > >
> > > > > ...
> > > > > Info: ****************************
> > > > > Info: STRUCTURE SUMMARY:
> > > > > Info: 66 ATOMS
> > > > > Info: 65 BONDS
> > > > > Info: 96 ANGLES
> > > > > Info: 31 DIHEDRALS
> > > > > Info: 32 IMPROPERS
> > > > > Info: 0 EXCLUSIONS
> > > > > Info: 195 DEGREES OF FREEDOM
> > > > > Info: 55 HYDROGEN GROUPS
> > > > > Info: TOTAL MASS = 783.886 amu
> > > > > Info: TOTAL CHARGE = 8.19564e-08 e
> > > > > Info: *****************************
> > > > > [20] stopped in suspend() at line 153 in file "BackEnd.cc" ($t1)
> > > > > 153 CsdScheduler(-1);
> > > > > (dbx) s
> > > > > Info: Entering startup phase 0 with 3804 kB of memory in use.
> > > > > Info: Entering startup phase 1 with 3804 kB of memory in use.
> > > > >
> > > > > Illegal instruction in . at 0x0 ($t2)
> > > > > 0x00000000 00000000 Invalid opcode.
> > > > > (dbx) where
> > > > > warning: could not locate trace table from starting address 0x0
> > > > > CmiHandleMessage(0x305d0a08) at 0x10011b38
> > > > > CsdScheduleForever() at 0x10012be4
> > > > > CsdScheduler(0xffffffff) at 0x10012d0c
> > > > > slave_init(int,char**)(argc = 3, argv = 0x3027b6d8), line 94 in
> > > > > "BackEnd.cc"
> > > > > ConverseRunPE(0x0) at 0x1000c96c
> > > > > call_startfn(0x1) at 0x1000b810
> > > > > _pthread_body(??) at 0xd004b3fc
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Hansang Bae
> > > > >
> > > >
> > > >
> > >
> >
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:34 CST