Re: NAMD2.6b2: Segmentation fault

From: Morad Alawneh (alawneh_at_chem.byu.edu)
Date: Tue Aug 29 2006 - 14:18:41 CDT

Thanks for your suggestions.

I have compiled from scratch NAMD 2.6b1 and it works without any problem.

I have compiled from scratch NAMD 2.6b2 and it gives the Segmentation Fault.

I have compiled from scratch NAMD 2.6b2 with charm++ from NAMD 2.6b1 and
also gives the Segmentation Fault.

By checking the NamdKnownBugs, I found the following:

    2.6b2

Parallel runs will often crash (segment fault) during startup phase 2
when CMAP crossterms are present in the psf file. Fixed.

According to that note it should have been fixed. So I downloaded the
source file of NAMD 2.6b2 today, and I followed your suggestions but
without any success yet.

Here what I got in the error log file:

bash: line 1: 24937 Segmentation fault /usr/bin/env MPIRUN_MPD=0
MPIRUN_HOST=m4a-3-21.local MPIRUN_PORT=52039
MPIRUN_PROCESSES='m4a-3-21i:m4a-3-21i:m4a-3-21i:m4a-3-21i:m4a-3-20i:m4a-3-20i:m4a-3-20i:m4a-3-20i:m4a-3-19i:m4a-3-19i:m4a-3-19i:m4a-3-19i:m4a-3-18i:m4a-3-18i:m4a-3-18i:m4a-3-18i:m4a-3-17i:m4a-3-17i:m4a-3-17i:m4a-3-17i:m4a-3-16i:m4a-3-16i:m4a-3-16i:m4a-3-16i:m4a-3-15i:m4a-3-15i:m4a-3-15i:m4a-3-15i:m4a-3-14i:m4a-3-14i:m4a-3-14i:m4a-3-14i:'
MPIRUN_RANK=6 MPIRUN_NPROCS=32 MPIRUN_ID=21872
/ibrix/home/mfm42/opt/namd-IB/Linux-amd64-MPI/namd2 +strategy USE_GRID
prod_sys.namd
Terminating processes.

Do you have other suggestions?

Thanks

/*Morad Alawneh*/

*Department of Chemistry and Biochemistry*

*C100 BNSN, BYU*

*Provo, UT 84602*

Jim Phillips wrote:
>
> There are many changes between the two versions. The first test is to
> see if the difference is in NAMD or Charm++. NAMD 2.6b2 should work
> with the version of Charm++ included in NAMD 2.6b1, so you might try
> building that first to see if the problem goes away. I would also
> rebuild 2.6b1 from scratch to see if there has been a change in your
> compilers, etc.
>
> -Jim
>
>
> On Tue, 29 Aug 2006, Morad Alawneh wrote:
>
>> Dear NAMD Developers,
>>
>>
>> After long time of debuging and testing the our hardware, NAMD2.6b1 runs
>> in parallel without any problem whereas NAMD2.6b2 does not, even though,
>> both were installed with the same instructions. Both versions can work
>> in serial and parallel (using GegaEthernet conection) without any
>> problem.
>>
>> I did what Jim suggested in the his previous email, but still I have the
>> same problem.
>>
>> I have attached the instruction again with this email.
>>
>> I am wondering if there is any change between the two versions?
>>
>> Would you suggest any solution for this issue?
>>
>> Thanks
>>
>>
>>
>> /*Morad Alawneh*/
>>
>> *Department of Chemistry and Biochemistry*
>>
>> *C100 BNSN, BYU*
>>
>> *Provo, UT 84602*
>>
>>
>>
>> Jim Phillips wrote:
>>>
>>> I can't tell much from just a segfault. Does the charm++ megatest
>>> work? Does NAMD run on one processor? Is there *any* output at all?
>>>
>>> My only comments looking at your build script are that on the charm
>>> ./build line "-language charm++ -balance rand" shouldn't be needed and
>>> may be harmful. Also, you shouldn't need "CHARMOPTS = -thread
>>> pthreads -memory os" with the TopSpin MPI library. It looks like
>>> you're following
>>> http://www.ks.uiuc.edu/Research/namd/wiki/?NamdOnInfiniBand but using
>>> the VMI build instructions. Also, please use the charm-5.9 source
>>> distributed with the NAMD source code, since this is the stable tree.
>>>
>>> -Jim
>>>
>>>
>>> On Mon, 21 Aug 2006, Morad Alawneh wrote:
>>>
>>>> Dear users,
>>>>
>>>> I have installed successfully NAMD2.6b1 onto my system, the
>>>> installation
>>>> instructions are attached with this email, and the program was working
>>>> without any problem.
>>>>
>>>> I followed the same way for installing NAMD2.6b2, but after
>>>> submitting a
>>>> job I received the following message in the error log file:
>>>>
>>>> bash: line 1: 31904 Segmentation fault /usr/bin/env MPIRUN_MPD=0
>>>> MPIRUN_HOST=m4a-7-11.local MPIRUN_PORT=40732
>>>> MPIRUN_PROCESSES='m4a-7-11i:m4a-7-11i:m4a-7-11i:m4a-7-11i:m4a-7-10i:m4a-7-10i:m4a-7-10i:m4a-7-10i:m4a-7-9i:m4a-7-9i:m4a-7-9i:m4a-7-9i:m4a-7-8i:m4a-7-8i:m4a-7-8i:m4a-7-8i:m4a-7-7i:m4a-7-7i:m4a-7-7i:m4a-7-7i:m4a-7-6i:m4a-7-6i:m4a-7-6i:m4a-7-6i:m4a-7-5i:m4a-7-5i:m4a-7-5i:m4a-7-5i:m4a-7-4i:m4a-7-4i:m4a-7-4i:m4a-7-4i:m4a-6-24i:m4a-6-24i:m4a-6-24i:m4a-6-24i:m4a-6-23i:m4a-6-23i:m4a-6-23i:m4a-6-23i:m4a-6-22i:m4a-6-22i:m4a-6-22i:m4a-6-22i:m4a-6-21i:m4a-6-21i:m4a-6-21i:m4a-6-21i:m4a-6-20i:m4a-6-20i:m4a-6-20i:m4a-6-20i:m4a-6-19i:m4a-6-19i:m4a-6-19i:m4a-6-19i:m4a-6-18i:m4a-6-18i:m4a-6-18i:m4a-6-18i:m4a-6-17i:m4a-6-17i:m4a-6-17i:m4a-6-17i:m4a-6-16i:m4a-6-16i:m4a-6-16i:m4a-6-16i:m4a-6-15i:m4a-6-15i:m4a-6-15i:m4a-6-15i:m4a-6-14i:m4a-6-14i:m4a-6-14i:m4a-6-14i:m4a-6-13i:m4a-6-13i:m4a-6-13i:m4a-6-13i:m4a-6-12i:m4a-6-12i:m4a-6-12i:m4a-6-12i:m4a-6-11i:m4a-6-11i:m4a-6-11i:m4a-6-11i:m4a-6-10i:m4a-6-10i:m4a-6-10i:m4a-6-10i:m4a-6-9i:m4a-6-9i:m4a-6-9i:m4a-6-9i:m4a-6-8i:m4a-6-8i:m4a-6-8i:m4a-6-8i:m4a-6-7i:m4a-6-7i:m4a-6-7i:m4a-6-7i:m4a-6-6i:m4a-6-6i:m4a-6-6i:m4a-6-6i:m4a-6-5i:m4a-6-5i:m4a-6-5i:m4a-6-5i:m4a-6-4i:m4a-6-4i:m4a-6-4i:m4a-6-4i:m4a-6-3i:m4a-6-3i:m4a-6-3i:m4a-6-3i:m4a-6-2i:m4a-6-2i:m4a-6-2i:m4a-6-2i:m4a-6-1i:m4a-6-1i:m4a-6-1i:m4a-6-1i:'
>>>>
>>>>
>>>> MPIRUN_RANK=16 MPIRUN_NPROCS=128 MPIRUN_ID=32469
>>>> /ibrix/home/mfm42/opt/namd-IB/Linux-amd64-MPI/namd2 +strategy USE_GRID
>>>> equil3_sys.namd
>>>>
>>>> Any suggestions for that kind of error will be appreciated.
>>>>
>>>>
>>>> My system info:
>>>>
>>>> Dell 1855 Linux cluster consisting that is equipped with four Intel
>>>> Xeon
>>>> EM64T processors (3.6GHz) and 8 GB of memory. The nodes are connected
>>>> with Infiniband, a high-speed, low-latency copper interconnect.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>> /*Morad Alawneh*/
>>>>
>>>> *Department of Chemistry and Biochemistry*
>>>>
>>>> *C100 BNSN, BYU*
>>>>
>>>> *Provo, UT 84602*
>>>>
>>>>
>>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:31 CST