Re: How to solve the segmentation faults in compiling NAMD to build and test the Charm++/Converse library (MPI version)?

From: Scott Brozell (srb_at_osc.edu)
Date: Fri May 04 2018 - 11:17:32 CDT

Hi,

This looks like a mis- or incomplete configuration of the cluster perhaps
because the cluster has been upgraded. Have you contacted the cluster
admin?

See for example
https://community.mellanox.com/thread/3812

and see p17 of
https://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters_Archive.pdf

scott

On Fri, May 04, 2018 at 09:05:21AM -0400, Brian Radak wrote:
> I'm not a charm++ expert, but this looks like a compiling issue there. I
> almost exclusively use the smartbuild.pl script, which has been fairly
> robust for me, so long as you tell it to use the MPI installation that it
> autodetects (that's not the default). Maybe give this a try?
>
> On Fri, May 4, 2018 at 8:07 AM, LIAO Mingling <Mingling1949_at_hotmail.com>
> wrote:
>
> > Dear all,
> >
> > I am trying to compile NAMD on the upgraded cluster (x86_64 GNU/Linux)
> > from source code with MPI version.
> >
> > When I typed the command to build and test the Charm++/Converse library:
> >
> > env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64 --with-production
> >
> > I got the following error. Does anyone know how to solve this problem?
> > Thanks in advance!
> >
> >
> >
> > *ERROR:*
> >
> > ###########
> >
> > [1525432978.493880] [login01:30113:0] sys.c:744 MXM WARN
> > Conflicting CPU frequencies detected, using: 2000.00
> >
> > [1525432978.504097] [login01:30114:0] sys.c:744 MXM WARN
> > Conflicting CPU frequencies detected, using: 2000.00
> >
> > [1525432978.507528] [login01:30111:0] sys.c:744 MXM WARN
> > Conflicting CPU frequencies detected, using: 2000.00
> >
> > [1525432978.514122] [login01:30112:0] sys.c:744 MXM WARN
> > Conflicting CPU frequencies detected, using: 2000.00
> >
> > [login01:30113:0] Caught signal 11 (Segmentation fault)
> >
> > [login01:30114:0] Caught signal 11 (Segmentation fault)
> >
> > [login01:30111:0] Caught signal 11 (Segmentation fault)
> >
> > [login01:30112:0] Caught signal 11 (Segmentation fault)
> >
> > ==== backtrace ====
> >
> > 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> > mxm-3.6.3104/src/mxm/util/debug/debug.c:641
> >
> > 3 0x0000000000068d0c mxm_error_signal_handler()
> > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
> >
> > 4 0x0000000000035270 killpg() ??:0
> >
> > 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/communicator/comm.c:976
> >
> > 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
> >
> > 7 0x00000000006bff0d ConverseInit() ??:0
> >
> > 8 0x0000000000591799 main() ??:0
> >
> > 9 0x0000000000021c05 __libc_start_main() ??:0
> >
> > 10 0x00000000004c66b9 _start() ??:0
> >
> > ===================
> >
> > ==== backtrace ====
> >
> > 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> > mxm-3.6.3104/src/mxm/util/debug/debug.c:641
> >
> > 3 0x0000000000068d0c mxm_error_signal_handler()
> > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
> >
> > 4 0x0000000000035270 killpg() ??:0
> >
> > 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/communicator/comm.c:976
> >
> > 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
> >
> > 7 0x00000000006bff0d ConverseInit() ??:0
> >
> > 8 0x0000000000591799 main() ??:0
> >
> > 9 0x0000000000021c05 __libc_start_main() ??:0
> >
> > 10 0x00000000004c66b9 _start() ??:0
> >
> > ===================
> >
> > ==== backtrace ====
> >
> > 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> > mxm-3.6.3104/src/mxm/util/debug/debug.c:641
> >
> > 3 0x0000000000068d0c mxm_error_signal_handler()
> > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
> >
> > 4 0x0000000000035270 killpg() ??:0
> >
> > 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/communicator/comm.c:976
> >
> > 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
> >
> > 7 0x00000000006bff0d ConverseInit() ??:0
> >
> > 8 0x0000000000591799 main() ??:0
> >
> > 9 0x0000000000021c05 __libc_start_main() ??:0
> >
> > 10 0x00000000004c66b9 _start() ??:0
> >
> > ===================
> >
> > ==== backtrace ====
> >
> > 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> > mxm-3.6.3104/src/mxm/util/debug/debug.c:641
> >
> > 3 0x0000000000068d0c mxm_error_signal_handler()
> > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
> >
> > 4 0x0000000000035270 killpg() ??:0
> >
> > 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/communicator/comm.c:976
> >
> > 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> > openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
> >
> > 7 0x00000000006bff0d ConverseInit() ??:0
> >
> > 8 0x0000000000591799 main() ??:0
> >
> > 9 0x0000000000021c05 __libc_start_main() ??:0
> >
> > 10 0x00000000004c66b9 _start() ??:0
> >
> > ===================
> >
> > #################
> >
> >
> >
> >
> >
> > *COMMANDS:*
> >
> > ###################
> >
> > $ source /etc/profile.d/modules.sh
> >
> > $ module load impi
> >
> > $ module load intel
> >
> > $ tar xzf NAMD_2.12_Source.tar.gz
> >
> > $ cd NAMD_2.12_Source
> >
> >
> >
> > $ wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-
> > linux-x86_64.tar.gz
> >
> > $ tar xzf fftw-linux-x86_64.tar.gz
> >
> > $ mv linux-x86_64 fftw
> >
> > $ wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.
> > 5.9-linux-x86_64.tar.gz
> >
> > $ wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.
> > 5.9-linux-x86_64-threaded.tar.gz
> >
> > $ tar xzf tcl8.5.9-linux-x86_64.tar.gz
> >
> > $ tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
> >
> > $ mv tcl8.5.9-linux-x86_64 tcl
> >
> > $ mv tcl8.5.9-linux-x86_64-threaded tcl-threaded
> >
> >
> >
> > $ tar xf charm-6.7.1.tar
> >
> > $ cd charm-6.7.1
> >
> > $ env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64 --with-production
> >
> > $ cd mpi-linux-x86_64/tests/charm++/megatest
> >
> > $ make pgm
> >
> > $ mpiexec -n 4 ./pgm
> >
> > ############################

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:19:53 CST