HP Cluster Platform 4000

From: Hannes Loeffler (hannes.loeffler_at_stfc.ac.uk)
Date: Fri Dec 05 2008 - 07:52:02 CST


I'm trying to run namd on our HP Cluster Platform 4000 with AMD Opteron
254 processor. OS is Red Hat Enterprise Linux AS release 4 (Nahant
Update 2).

I thought that the pre-compiled binaries from Linux-amd64 or
Linux-amd64-TCP would be the right ones but they obviously don't do
parallel runs on our machine (at least the run times from 2 to 32
processors are basically the same).

Next, I tried to compile from source. I got an executable but processes
started seem to hang in the queue without producing any output until I
kill the job. The output gives me only information about step 0 and
'Charm++ Warning: Non Charm++ Message Received.'

How I compiled my own binary:
* downloaded the sources for NAMD 2.6, tcl 8.3 (tcl-linux.tar.gz from
the NAMD site) and fftw 2.1.5 from their web site. Tcl and FFTW
compiled and installed without any problems.

* charm++ was build with ./build charm++ mpi-linux-amd64 --no-shared -O
-DCMK_OPTIMIZE=1. Compiler was /opt/hpmpi/bin/mpiCC.

I modified src/arch/mpi-linux-amd64/conv-mach.sh to use libmpi instead
of libmpich. The cluster comes with MPI libraries in /opt/hpmpi which I
assume are HPs own MPI libraries. (Other options may
be /opt/hpmpi/MPICH1.2, something called Voltaire and another one might
be called Elan, though I am not sure if any of them would be useful.
For both AMBER and GROMACS I used hpmpi.)

* NAMD was configured with ./config tcl fftw Linux-amd64-MPI and
compiled. ldd on the binary gives

        libmpi.so.1 => /opt/hpmpi/lib/linux_amd64/libmpi.so.1
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003730e00000)
        libtcl8.3.so => /home/hhl/usr/lib/libtcl8.3.so
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003731000000)
        libmpio.so.1 => /opt/hpmpi/lib/linux_amd64/libmpio.so.1
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003733800000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003733c00000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003730b00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003730900000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0

Jobs are run with /opt/hpmpi/bin/mpirun -srun /path/to/namd2 md.in
submitted via bsub (LSF 6.1).

Any advise would be welcome,

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:48:45 CST