Re: problems compiling charm and NAMD on intel cluster using MPI over ethernet

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Thu Dec 24 2009 - 02:30:34 CST

On Thu, Dec 24, 2009 at 12:48 AM, Corenflos, Steven
<scorenfl_at_indiana.edu> wrote:
> I'm currently trying to build NAMD on a parallel cluster of Intel Xeon quad-core, dual-processor computers. The nodes are currently communicating by a gigabit ethernet interconnect, which will be improved to 10GbE in late January. The goal is to overcome latency with bandwidth. All nodes are running RedHat Enterprise Linux 5.4, and we're using Intel's C/C++ and Fortran compilers.

you _cannot_ "overcome" latency with bandwidth. all you can do about latency
is hide it, do something else while you are waiting. with classical
MD, there is
not too much work to be done relative to the amount of communication. so your
scalability will always suffer and increasing the bandwidth will not
at all help
with that. sorry.

> Due to the software we're writing, we want to run these simulations on top of MPI.

the best way to hide latencies with NAMD is to use the charm++ layer directly
on top of TCP (or UDP).

[...]

> Now we get to Charm, which is where I'm running into problems. What options should I be sending to the build command? The nodes are connected via ethernet, which makes me thing I need to use the following:
> charm++ net-linux-x86_64 icc  smp   ifort   -j16  -O2 -DCMK_OPTIMIZE

this will bypass MPI and use charm++ directly on topo of UDP.

> (I'm using the smp flag because I read that could improve performance with my hardware setup.)
>
> However since this is supposed to be running over OpenMPI, do I instead need to use an MPI command, like:
> charm++ mpi-linux-amd64 icc smp tcp ?
>
> Going through the auto ./build command gave me this:
> ./build charm++ mpi-linux-x86_64   smp  mpicxx  ifort   -j16  -O2 -DCMK_OPTIMIZE

> I'm very confused on how to do this properly for mpi over ethernet. I went with the last one since that's what the build script gave me. Is this the correct thing to do?

when you use MPI as communication layer for charm++, then the MPI API is what
is being used and the actual low layer communication doesn't matter. i
have build
namd binaries that use OpenMPI and can run them on top of either
TCP/IP, infiniband
or myrinet depending on which communication layer i choose with
mpirun. i set these
differently for different clusters by default and then the very same
binary works on all of them.

> Anyway, so I'm trying to compile namd with the charm from the auto-generated build command. In the arch directory I set fftw to the version I've compiled in /opt/fftw and the tcl to version 8.4 provided by the RedHat system, in the appropriate directories. The config line I use is:
> ./config Linux-x86_64-MPI-icc --charm-arch mpi-linux-x86_64-ifort-smp-mpicxx
>
> Everything is chugging along just fine until I get hit with the following error:
> src/ComputePme.C(12): catastrophic error: could not open source file "sfftw.h"
>  #include <sfftw.h>
>
> This simply is not in my fftw folder. Where did I go wrong here? Did I not compile FFTW properly? Is there some command line argument I should use instead?

nothing is wrong here. redhat uses different naming conventions for
fftw-2. the single
precision version is not generated with the prefix. i.e. you can add a
symbolic link
to fftw.h to sfftw.h and similarly for the corresponding libraries.

cheers,
    axel.
>
> Thank you for all your help. I apologize for the length and complexity of this post.
>
> -Steve
>
>

-- 
Dr. Axel Kohlmeyer    akohlmey_at_gmail.com
Institute for Computational Molecular Science
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:38 CST