Re: NAMD 2.8 on Cray XE6 segfaulting

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Tue Jul 26 2011 - 11:05:30 CDT

Hi,

Add -DNOHOSTNAME to the CXX definition in CRAY-XT-g++.arch (see
http://www.ks.uiuc.edu/Research/namd/cvs2html/CRAY-XT-g++.arch_arch_diff_1.6_1.5.html)
and use the old Tcl 8.3.3 library from
http://www.ks.uiuc.edu/Research/namd/libraries/tcl-linux-amd64.tar.gz

-Jim

On Tue, 26 Jul 2011, Tim Robinson wrote:

> Dear Cray XE6 owners/users
>
> I am having trouble getting NAMD 2.8 to run on Cray XE6 (2.7 was no
> problem). I have tried with charm-6.3.2 and with charm-6.2.2.
>
> The basic steps are:
>
> ./build charm++ mpi-crayxt --no-build-shared --with-production
> ./config CRAY-XT-g++
> make
>
> (I am using gcc/4.5.2 and fftw/2.1.5.2)
>
> The executable crashes very soon after launch:
>
> Charm++> Running on MPI version: 2.2 multi-thread support: 0 (max
> supported: -1)
> Charm++> Running on 11 unique compute nodes (24-way SMP).
> Info: NAMD 2.8 for CRAY-XT-MPI
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60202 for mpi-crayxt
> Info: Built Tue Jul 26 12:08:28 CEST 2011 by robinson on palu2
> [0] Stack Traceback:
> [0:0] [0xb89f10]
> [125] Stack Traceback:
> [125:0] [0xb89f10]
> [125:1] [0xb89ebb]
> [125:2] [0xbee6a3]
> [125:3] [0xb15f70]
> [125:4] [0xac55dd]
> [125:5] [0x988153]
> <and so on>
>
>
> The standard error:
>
> ------------- Processor 0 Exiting: Caught Signal ------------
> Signal: 11
> Rank 125 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
> MPI_Abort(MPI_COMM_WORLD, 1) - process 125
> ------------- Processor 125 Exiting: Caught Signal ------------
> Signal: 6
> Rank 124 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
> MPI_Abort(MPI_COMM_WORLD, 1) - process 124
> ------------- Processor 124 Exiting: Caught Signal ------------
> Signal: 6
> Rank 121 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
> MPI_Abort(MPI_COMM_WORLD, 1) - process 121
> ------------- Processor 121 Exiting: Caught Signal ------------
> <and so on>
>
> Does anyone have a working build of 2.8 on XE6?
>
> Many thanks in advance,
>
> Tim
>
> --
> Dr Tim Robinson
> HPC Application Analyst
> Swiss National Supercomputing Centre
> Galleria 2, Via Cantonale
> 6928 Manno
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:30 CST