Re: NAMD 2.8 on Cray XE6 segfaulting

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Tue Jul 26 2011 - 16:14:34 CDT

Hi again,

I now have a slightly modified Tcl 8.5.9 that works on the Cray XE at
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-crayxe.tar.gz or
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-crayxe-threaded.tar.gz

This is the patch; it just avoids getpwuid during initialization:
http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-crayxe-tclUnixInit.patch

-Jim

On Tue, 26 Jul 2011, Jim Phillips wrote:

>
> Make that -DNOHOSTNAME -DNO_GETPWUID or DCD header writing will fail.
>
> -Jim
>
> On Tue, 26 Jul 2011, Jim Phillips wrote:
>
>> Hi,
>>
>> Add -DNOHOSTNAME to the CXX definition in CRAY-XT-g++.arch (see
>> http://www.ks.uiuc.edu/Research/namd/cvs2html/CRAY-XT-g++.arch_arch_diff_1.6_1.5.html)
>> and use the old Tcl 8.3.3 library from
>> http://www.ks.uiuc.edu/Research/namd/libraries/tcl-linux-amd64.tar.gz
>>
>> -Jim
>>
>>
>> On Tue, 26 Jul 2011, Tim Robinson wrote:
>>
>>> Dear Cray XE6 owners/users
>>>
>>> I am having trouble getting NAMD 2.8 to run on Cray XE6 (2.7 was no
>>> problem). I have tried with charm-6.3.2 and with charm-6.2.2.
>>>
>>> The basic steps are:
>>>
>>> ./build charm++ mpi-crayxt --no-build-shared --with-production
>>> ./config CRAY-XT-g++
>>> make
>>>
>>> (I am using gcc/4.5.2 and fftw/2.1.5.2)
>>>
>>> The executable crashes very soon after launch:
>>>
>>> Charm++> Running on MPI version: 2.2 multi-thread support: 0 (max
>>> supported: -1)
>>> Charm++> Running on 11 unique compute nodes (24-way SMP).
>>> Info: NAMD 2.8 for CRAY-XT-MPI
>>> Info:
>>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>>> Info: for updates, documentation, and support information.
>>> Info:
>>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>>> Info: in all publications reporting results obtained with NAMD.
>>> Info:
>>> Info: Based on Charm++/Converse 60202 for mpi-crayxt
>>> Info: Built Tue Jul 26 12:08:28 CEST 2011 by robinson on palu2
>>> [0] Stack Traceback:
>>> [0:0] [0xb89f10]
>>> [125] Stack Traceback:
>>> [125:0] [0xb89f10]
>>> [125:1] [0xb89ebb]
>>> [125:2] [0xbee6a3]
>>> [125:3] [0xb15f70]
>>> [125:4] [0xac55dd]
>>> [125:5] [0x988153]
>>> <and so on>
>>>
>>>
>>> The standard error:
>>>
>>> ------------- Processor 0 Exiting: Caught Signal ------------
>>> Signal: 11
>>> Rank 125 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 125
>>> ------------- Processor 125 Exiting: Caught Signal ------------
>>> Signal: 6
>>> Rank 124 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 124
>>> ------------- Processor 124 Exiting: Caught Signal ------------
>>> Signal: 6
>>> Rank 121 [Tue Jul 26 12:26:57 2011] [c1-0c2s2n3] application called
>>> MPI_Abort(MPI_COMM_WORLD, 1) - process 121
>>> ------------- Processor 121 Exiting: Caught Signal ------------
>>> <and so on>
>>>
>>> Does anyone have a working build of 2.8 on XE6?
>>>
>>> Many thanks in advance,
>>>
>>> Tim
>>>
>>> --
>>> Dr Tim Robinson
>>> HPC Application Analyst
>>> Swiss National Supercomputing Centre
>>> Galleria 2, Via Cantonale
>>> 6928 Manno
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:37 CST