Re: 32 bit + 64 bit mixed architecture problem

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Thu Nov 12 2009 - 10:12:36 CST

On Thu, 2009-11-12 at 16:24 +0100, Yogesh Aher wrote:
> Dear Axel,

yogesh,

> Thank you very much for your nice suggestion. I transferred the
> libraries, executables and other necessary files from 32-bit machine
> to x86_64, modified the path and ld_library_path. And now it's working
> fine.
> But, I see some strange behaviour from 32-bit machine.. i.e. when I
> submit job only with
>
> 32-host$./mpirun -host studpc01 namd namd.conf , it gives following
> error:
> bash: orted: command not found

that is an OpenMPI installation issue. your mpirun
determines the location of its ancillary programs
through the "basename" of the mpirun executable location.
try using the full absolute path to mpirun or the -prefix option.

[...]

> But, if the path to mpirun is mentioned, then it works fine, i.e.
> 32-host$./home/user/folder/bin/mpirun -host studpc01 namd namd.conf
>
> I correctly defined the PATH & LD_LIBRARY_PATH in .bashrc of studpc01.
> But submitting job from studpc01 using both ways works perfect.
>
> Any idea about this behaviour?

see above. check out the OpenMPI documentation or ask/search
the openmpi users mailing list for more info.

cheers,
  axel.

>
> Thanking you,
>
> Sincerely,
> Yogesh
>
>
> On Wed, Nov 11, 2009 at 7:08 PM, Axel Kohlmeyer <akohlmey_at_gmail.com>
> wrote:
> On Wed, 2009-11-11 at 17:17 +0100, Yogesh Aher wrote:
> > Dear NAMD users,
> >
> > I compiled NAMD using openmpi for 32-bit and 64-bit machines
> > separately (keeping them on individual machines with same
> names) and
> > now when I would like to run both executables simultaneously
> in
> > parallel mode, it gives me following error:
>
>
> yogesh,
>
> forget about the complications of running mixed
> architecture and just run the 32-bit binaries throughout.
> using a 64-bit binary does not automatically imply
> that it runs faster(!). it is only on x86 architectures
> (due to limitations of the number of general purpose
> registers in 32-bit mode) that 64-bit binaries are at
> all competitive. on "proper" cpu architectures, the
> 64-bit version often has of the order of 10-20% overhead
> due to having to use twice as large pointer variables
> and thus using the CPU cache less efficiently.
>
> cheers,
> axel.
>
> >
> > user_at_studpcxx:~$ mpirun -hostfile machines namd namd.conf
> >
> > Charm++> Running on MPI version: 2.1 multi-thread support:
> > MPI_THREAD_SINGLE (max supported: MPI_THREAD_SINGLE)
> > Charm++> cpu topology info is being gathered.
> > Charm++> 2 unique compute nodes detected.
> > ------------- Processor 2 Exiting: Called CmiAbort
> ------------
> > Reason: Internal Error: Unknown-msg-type. Contact
> Developers.
> >
> > ------------- Processor 3 Exiting: Called CmiAbort
> ------------
> > Reason: Internal Error: Unknown-msg-type. Contact
> Developers.
> >
> > [2] Stack Traceback:
> > [0] CmiAbort+0x25 [0x8366f3e]
> > [1] namd [0x830d4cd]
> > [2] CmiHandleMessage+0x22 [0x8367c20]
> > [3] CsdScheduleForever+0x67 [0x8367dd2]
> > [4] CsdScheduler+0x12 [0x8367d4c]
> > [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
> > [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
> > [7] main+0x2e [0x80f65b6]
> > [3] Stack Traceback:
> > [0] CmiAbort+0x25 [0x8366f3e]
> > [1] namd [0x830d4cd]
> > [2] CmiHandleMessage+0x22 [0x8367c20]
> > [3] CsdScheduleForever+0x67 [0x8367dd2]
> > [4] CsdScheduler+0x12 [0x8367d4c]
> > [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
> > [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
> > [7] main+0x2e [0x80f65b6]
> > [8] __libc_start_main+0xd3 [0x88fde3]
> > [9] __gxx_personality_v0+0x101 [0x80f3405]
> >
> --------------------------------------------------------------------------
> > MPI_ABORT was invoked on rank 3 in communicator
> MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
> processes.
> > You may or may not see output from other processes,
> depending on
> > exactly when Open MPI kills them.
> >
> --------------------------------------------------------------------------
> > [8] __libc_start_main+0xd3 [0x31cde3]
> > [9] __gxx_personality_v0+0x101 [0x80f3405]
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 2 with PID 16227 on
> > node sibar exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> --------------------------------------------------------------------------
> >
> [studpcxx.xxx.xxx.xx][[7575,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)
> > Info: NAMD 2.6 for Linux-amd64-MPI
> > Info:
> > Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> > Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
> > Info:
> > Info: Please cite Phillips et al., J. Comp. Chem.
> 26:1781-1802 (2005)
> > Info: in all publications reporting results obtained with
> NAMD.
> > Info:
> > Info: Based on Charm++/Converse 60102 for mpi-linux-x86_64
> > [studpcxx.xxx.xxx.xx:27457] 1 more process has sent help
> message
> > help-mpi-api.txt / mpi-abort
> > [studpcxx.xxx.xxx.xx:27457] Set MCA parameter
> > "orte_base_help_aggregate" to 0 to see all help / error
> messages
> >
> > Has anybody tried to work with such mixed environment of
> 32-bit &
> > 64-bit machines..
> >
> > Look forward to hear about the solution..
> >
> > Thanking you,
> > Sincerely,
> > Yogesh
> >
>
> --
> Dr. Axel Kohlmeyer akohlmey_at_gmail.com
> Institute for Computational Molecular Science
> College of Science and Technology
> Temple University, Philadelphia PA, USA.
>
>

-- 
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com 
Institute for Computational Molecular Science
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:29 CST