RE: Problems compiling NAMD

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Tue Oct 07 2008 - 18:44:06 CDT

On Wed, 8 Oct 2008, Jesper Sørensen wrote:

JS> Hi Alexander and Axel,
JS>
JS> Thank you both for the comments.
JS>
JS> Intel fails already at testing charm++ and this is just for the net-version,
JS> not MPI.
JS> I'm using build options:
JS> > ./build charm++ net-linux-amd64 icc -no-shared -O- DCMK_OPTIMIZE=1

please note that the compilation notes say to use:

  --no-shared -O -DCMK_OPTIMIZE=1

JS> The test fails with:
JS>
JS> >./charmrun ./pgm +p1
JS> >Megatest is running on 1 processors.

hmmmm.... which version of charm++ are you
trying to compile? is it the 5.9 version
bundled with namd2.6?

i just tried compiling a net version of charm++ on my
desktop (intel icc 9.1.045, x86_64 cpu) and it works fine,
but there i am the current charm++ cvs code.

i found a copy of charm-5.9 on a different x86_64 machine
that has intel 10.1.015. however when compiling and
testing it, i get the same segmentation fault in test14
of megatest. on the other hand compiling with gcc (v4.1.2)
worked, after fixing a few inconsistencies in the code that
gcc4 chokes on.

so perhaps there is some code in this charm++ version that
triggers a bug in the intel compiler or there is a bug
in the code that is only exposed by intel compilers...

hope this helps,
    axel.

JS> >...
JS> >test 14: initiated [tempotest (fang)]
JS> >------------- Processor 0 Exiting: Caught Signal ------------
JS> >Signal: segmentation violation
JS> >Suggestion: Try running with '++debug', or linking with '-memory paranoid'.
JS> >Stack Traceback:
JS> > [0] /lib64/tls/libc.so.6 [0x335502e380]
JS> > [1] [0x6bedd0]
JS> >Fatal error on PE 0> segmentation violation
JS> >make: *** [test] Error 1
JS>
JS> Do you guys or anybody else have a suggestion to what might be wrong?
JS>
JS> Kind regards,
JS>
JS> Jesper
JS>
JS>
JS> -----Oprindelig meddelelse-----
JS> Fra: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] På vegne af
JS> Axel Kohlmeyer
JS> Sendt: 7. oktober 2008 18:54
JS> Til: Jesper Soerensen
JS> Cc: Alexander A. Vakhrushev; namd-l_at_ks.uiuc.edu
JS> Emne: Re: namd-l: Problems compiling NAMD
JS>
JS> On Tue, 7 Oct 2008, Jesper Soerensen wrote:
JS>
JS> JS> Hi Alexander,
JS> JS>
JS> JS> I'm assuming that if I don't specify a compiler it defaults to gcc?
JS> JS> Then yes I have made a version of gcc without MPI, but once I add MPI it
JS> JS> fails. It seems to be that our MPI has been compiled with either intel
JS> JS> or pgi compilers and so gcc fails because MPI calls some intel keywords.
JS> JS>
JS> JS> for example while running Make pgm:
JS> JS>
JS> JS> >/com/mpich-1.2.7p1/lib/libmpich.a(dmpipk.o)(.text+0x249): In function
JS> JS> >`MPIR_UnPack_Hvector':
JS> JS> >: undefined reference to `_intel_fast_memcpy'
JS> JS>
JS> JS> I can ask our sysadmin to make a gcc version of MPI and see if that
JS> JS> helps. Would that be a good idea?
JS>
JS>
JS> yes. one - easier to achieve - alternative, would be to reduce
JS> the optimization level in the respective "arch" files. compilers
JS> become increasingly unreliable with higher optimization levels.
JS> with intel compilers, i found that a combination of flags like
JS>
JS> -O2 -unroll -march=pentiumpro -mtune=pentiumpro -pc64
JS>
JS> produces fast and reliably working executables on AMD cpus.
JS>
JS> i would also, in case you are still getting segmentation
JS> faults, remove the -ip flag.
JS>
JS> cheers,
JS> axel.
JS>
JS>
JS>
JS> JS>
JS> JS> Kind regards,
JS> JS> Jesper
JS> JS>
JS> JS>
JS> JS>
JS> JS> On Mon, 2008-10-06 at 21:00 +0500, Alexander A. Vakhrushev wrote:
JS> JS> > Hi Jesper!
JS> JS> >
JS> JS> > Did you try just gcc version?
JS> JS> >
JS> JS> > 2008/10/6 Jesper Soerensen <jes_at_chem.au.dk>:
JS> JS> > > Hi Alexander,
JS> JS> > >
JS> JS> > > I'm running CentOS 4.3. I've tried using both OpenMPI and MPICH but
JS> both
JS> JS> > > fail. I've tried:
JS> JS> > > OpenMPI version 1.2.6
JS> JS> > > MPICH version 1.2.7
JS> JS> > > Using Intel compilers (icc & ifort) version 10.1.017
JS> JS> > >
JS> JS> > > I ran the megatest set from charm++ and I get a failure in test14:
JS> JS> > >
JS> JS> > >> ./mpirun ./pgm
JS> JS> > >> ...
JS> JS> > >> test 14: initiated [tempotest (fang)]
JS> JS> > >> p0_29927: p4_error: interrupt SIGSEGV: 11
JS> JS> > >
JS> JS> > >
JS> JS> > >> mpirun -np 4 -all-local ./pgm
JS> JS> > >> Megatest is running on 4 processors.
JS> JS> > >> ...
JS> JS> > >> test 14: initiated [tempotest (fang)]
JS> JS> > >> p2_30323: p4_error: interrupt SIGSEGV: 11
JS> JS> > >> p3_30346: p4_error: Found a dead connection while looking for
JS> JS> > > messages: 0
JS> JS> > >> [jesper_at_fe1 megatest]$ p1_30297: p4_error: interrupt SIGx: 13
JS> JS> > >> rm_l_2_30324: (10.011719) net_send: could not write to fd=5, errno
JS> =
JS> JS> > > 32
JS> JS> > >> rm_l_3_30347: (7.667969) net_send: could not write to fd=5, errno =
JS> 32
JS> JS> > >> p2_30323: (12.027344) net_send: could not write to fd=5, errno = 32
JS> JS> > >> p3_30346: (13.675781) net_send: could not write to fd=5, errno = 32
JS> JS> > >> p1_30297: (18.843750) net_send: could not write to fd=5, errno = 32
JS> JS> > >
JS> JS> > > Does anybody recognize this?
JS> JS> > >
JS> JS> > > Kind regards,
JS> JS> > >
JS> JS> > > Jesper
JS> JS> > >
JS> JS> > >
JS> JS> > >
JS> JS> > >
JS> JS> > > On Fri, 2008-10-03 at 22:44 +0500, Alexander A. Vakhrushev wrote:
JS> JS> > >> Hi Jesper!
JS> JS> > >>
JS> JS> > >> What is platform of your cluster?
JS> JS> > >>
JS> JS> > >> 2008/10/3 Jesper Soerensen <jes_at_chem.au.dk>:
JS> JS> > >> > Hi,
JS> JS> > >> >
JS> JS> > >> > I've just compiled NAMD on our cluster and this runs through
JS> fine, but
JS> JS> > >> > when I start a job I get the following error in the log file:
JS> JS> > >> >>Info: Entering startup phase 8 with 134856 kB of memory in use.
JS> JS> > >> >>Info: Finished startup with 143120 kB of memory in use.
JS> JS> > >> >>1 additional process aborted (not shown)
JS> JS> > >> >
JS> JS> > >> > And the cluster job-error log says:
JS> JS> > >> >>mpirun noticed that job rank 0 with PID 25232 on node s07n06
JS> exited on
JS> JS> > >> >>signal 11 (Segmentation fault).
JS> JS> > >> >
JS> JS> > >> > I am running a Linux-amd64-MPI-icc-ifort version if this helps.
JS> Also, let
JS> JS> > >> > me know if there is more information I can give to help solve the
JS> JS> > >> > problem. I'm just wondering if anybody has seen this type of
JS> error
JS> JS> > >> > before.
JS> JS> > >> >
JS> JS> > >> > Kind regards,
JS> JS> > >> >
JS> JS> > >> > Jesper Soerensen
JS> JS> > >> >
JS> JS> > >> >
JS> JS> > >> > --
JS> JS> > >> > Jesper Sørensen, M.Sc.
JS> JS> > >> > Ph.D.-student
JS> JS> > >> > Biomodelling Group, inSPIN and iNANO centers
JS> JS> > >> > Department of Chemistry
JS> JS> > >> > University of Aarhus
JS> JS> > >> > Langelandsgade 140
JS> JS> > >> > 8000 Aarhus C
JS> JS> > >> > Office: 1510-419
JS> JS> > >> > Tlf. 89423385
JS> JS> > >> > email: jes_at_chem.au.dk
JS> JS> > >> > www: www.chem.au.dk/~biomodelling
JS> JS> > >> >
JS> JS> > >> >
JS> JS> > >>
JS> JS> > >>
JS> JS> > >>
JS> JS> > >
JS> JS> > >
JS> JS> >
JS> JS> >
JS> JS> >
JS> JS>
JS>
JS>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:56 CST