recompiling NAMD 2.6 from sources on CentOS-3 x86_64: Charm++ failing test

From: Tru Huynh (tru_at_pasteur.fr)
Date: Mon Nov 13 2006 - 10:16:46 CST

Hi,

I am trying to make namd-2.6 run on our CentOS-3 x86_64 machines.
The pre-built executables would not run as the have been compiled with a
newer gcc compiler than ours (3.2.3).

[tru_at_sillage centos-3_x86_64]$ ldd NAMD_2.6_Linux-amd64*/namd2
NAMD_2.6_Linux-amd64/namd2:
        libdl.so.2 => /lib64/libdl.so.2 (0x0000002a9568e000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a95791000)
        libstdc++.so.6 => not found
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a9591a000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)
NAMD_2.6_Linux-amd64-TCP/namd2:
        libdl.so.2 => /lib64/libdl.so.2 (0x0000002a9568e000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a95791000)
        libstdc++.so.6 => not found
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a9591a000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)

I have downloaded the sources and followed the notes.tx indications:
<quote>
tar xzf NAMD_2.6_Source.tar.gz
cd NAMD_2.6_Source
tar xf charm-5.9.tar
cd charm-5.9
./build charm++ net-linux --no-shared -O -DCMK_OPTIMIZE=1
                                setenv CONV_RSH ssh (we are using ssh, no rsh)
cd net-linux/tests/charm++/megatest
make pgm
</quote>

[tru_at_sillage megatest]$ ./charmrun ++local +p4 ./pgm
Megatest is running on 4 processors.
test 0: initiated [bitvector (jbooth)]
test 0: completed (0.00 sec)
test 1: initiated [immediatering (gengbin)]
test 1: completed (0.15 sec)
test 2: initiated [callback (olawlor)]
------------- Processor 0 Exiting: Caught Signal ------------
Signal: segmentation violation
Suggestion: Try running with '++debug', or linking with '-memory paranoid'.
Program finished.
Charmrun: error on request socket--
Socket closed before recv.

[tru_at_sillage megatest]$ ./charmrun ++local ++debug +p4 ./pgm
Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 127.0.0.1, port = 44772
Charmrun> start 0 node program on localhost.
Charmrun> start 1 node program on localhost.
Charmrun> start 2 node program on localhost.
Charmrun> start 3 node program on localhost.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=42485)
Charmrun> Waiting for 1-th client to connect.
Charmrun> client 2 connected (IP=127.0.0.1 data_port=42486)
Charmrun> Waiting for 2-th client to connect.
Charmrun> client 1 connected (IP=127.0.0.1 data_port=42488)
Charmrun> Waiting for 3-th client to connect.
Charmrun> client 3 connected (IP=127.0.0.1 data_port=42489)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Megatest is running on 4 processors.
test 0: initiated [bitvector (jbooth)]
test 0: completed (0.00 sec)
test 1: initiated [immediatering (gengbin)]
test 1: completed (0.13 sec)
test 2: initiated [callback (olawlor)]
Charmrun: error on request socket--
Socket closed before recv.

I also tried with only 1 process:
[tru_at_sillage megatest]$ ./charmrun ++local ++debug +p1 ./pgm
Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 127.0.0.1, port = 44777
Charmrun> start 0 node program on localhost.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=42489)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Megatest is running on 1 processors.
test 0: initiated [bitvector (jbooth)]
test 0: completed (0.00 sec)
test 1: initiated [immediatering (gengbin)]
test 1: completed (0.00 sec)
test 2: initiated [callback (olawlor)]
Charmrun: error on request socket--
Socket closed before recv.

with gpm recompiled with "-memory paranoid":
../../../bin/charmc -o pgm.paranoid megatest.o groupring.o nodering.o varsizetest.o varraystest.o groupcast.o nodecast.o synctest.o fib.o arrayring.o tempotest.o packtest.o queens.o migration.o marshall.o priomsg.o priotest.o rotest.o statistics.o templates.o inherit.o reduction.o callback.o immediatering.o bitvector.o -language charm++ -memory paranoid

[tru_at_sillage megatest]$ ./charmrun ++local ++debug +p4 ./pgm.paranoid
Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 127.0.0.1, port = 44923
Charmrun> start 0 node program on localhost.
Charmrun> start 1 node program on localhost.
Charmrun> start 2 node program on localhost.
Charmrun> start 3 node program on localhost.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.

[tru_at_sillage ~]$ ps -ef
....
tru 26893 5193 0 17:13 pts/22 00:00:00 ./charmrun ++local ++debug +p4 ./pgm.paranoid
tru 26894 26893 0 17:13 pts/22 00:00:00 [pgm.paranoid <defunct>]
tru 26895 26893 0 17:13 pts/22 00:00:00 [pgm.paranoid <defunct>]
tru 26896 26893 0 17:13 pts/22 00:00:00 [pgm.paranoid <defunct>]
tru 26897 26893 0 17:13 pts/22 00:00:00 [pgm.paranoid <defunct>]

ssh login to localhost works without password.

-- 
Dr Tru Huynh          | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:tru_at_pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France  

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:10 CST