32 bit + 64 bit mixed architecture problem

From: Yogesh Aher (aher.yogesh_at_gmail.com)
Date: Wed Nov 11 2009 - 10:17:27 CST

Dear NAMD users,

I compiled NAMD using openmpi for 32-bit and 64-bit machines separately
(keeping them on individual machines with same names) and now when I would
like to run both executables simultaneously in parallel mode, it gives me
following error:

user_at_studpcxx:~$ mpirun -hostfile machines namd namd.conf

Charm++> Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE
(max supported: MPI_THREAD_SINGLE)
Charm++> cpu topology info is being gathered.
Charm++> 2 unique compute nodes detected.
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

[2] Stack Traceback:
  [0] CmiAbort+0x25 [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22 [0x8367c20]
  [3] CsdScheduleForever+0x67 [0x8367dd2]
  [4] CsdScheduler+0x12 [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
  [7] main+0x2e [0x80f65b6]
[3] Stack Traceback:
  [0] CmiAbort+0x25 [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22 [0x8367c20]
  [3] CsdScheduleForever+0x67 [0x8367dd2]
  [4] CsdScheduler+0x12 [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
  [7] main+0x2e [0x80f65b6]
  [8] __libc_start_main+0xd3 [0x88fde3]
  [9] __gxx_personality_v0+0x101 [0x80f3405]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
  [8] __libc_start_main+0xd3 [0x31cde3]
  [9] __gxx_personality_v0+0x101 [0x80f3405]
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 16227 on
node sibar exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[studpcxx.xxx.xxx.xx][[7575,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)
Info: NAMD 2.6 for Linux-amd64-MPI
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60102 for mpi-linux-x86_64
[studpcxx.xxx.xxx.xx:27457] 1 more process has sent help message
help-mpi-api.txt / mpi-abort
[studpcxx.xxx.xxx.xx:27457] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages

Has anybody tried to work with such mixed environment of 32-bit & 64-bit
machines..

Look forward to hear about the solution..

Thanking you,
Sincerely,
Yogesh

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:29 CST