NAMD Wiki: NamdOnBlueGene
NAMD has been ported to the BG/L platform, and serial and parallel performance are greatly improved.
The following config file options should improve NAMD scalibility on Blue Gene/L. The twoAway options improve scalibility. Currently the energy loop is expensive on Blue Gene/L and frequently computing energy increases computation and communication.
twoAwayX yes #when (numatoms/numcpus) < 500
twoAwayY yes #when (numatoms/numcpus) < 50
LdbPeriod 20000 #slow down the loadbalancing frequency for long runs
In virtual node mode the TXYZ mapping works better. This can be achieved from the command line option
As of March 22, 2007 new BlueGeneL binaries of NAMD 2.6 are available for download. These binaries use an unreleased low-level interface to the communications layer and therefore scale better than what you could build yourself using MPI. If you want to build your own version first beg IBM for the communication library, then build the latest Charm++ (bluegenel-xlc) and finally NAMD (BlueGeneL-xlC).
We will not be releasing NAMD binaries for this platform, since we have little confidence in their portability between OS releases. Single-precision FFTW libraries (stolen from SDSC) and a hacked version of Tcl (because of all the missing system calls, as both source and a library) are at http://www.ks.uiuc.edu/Research/namd/libraries/ so you just need to build charm (mpi-bluegenel-xlc) and NAMD (BlueGeneL-MPI-xlC). Note that charm-5.9 won't work on BG, so you'll need to get a newer release or the developmental build (see http://charm.cs.uiuc.edu/download ).
As of Feb 24, 2006 I have fixed an ancient bug in BroadcastMgr.C that was causing a memory leak on node 0 (the one for which memory usage is reported) in simulations with more than 64 processors or more processors than patches (i.e., when node 0 has no patches) and when minimization, constant pressure, velocity rescaling, or some other method that relies on feedback from a global quantity. This is being posted here because BG/L users are very likely to encounter and suffer from this leak.
If you get "ORB PME allocator failed" errors when running a particular system with PME on a certain number of processors, try adding "twoAwayX yes" to the config file. This will be fixed.
In comparison to the NCSA Altix (our fastest platform) 1 Altix CPU = 2 Xeon CPUs = 8 BG CPUs (4 BG nodes). The machine does scale very well, which makes up for that. The single rack at SDSC (1024 nodes, 2048 CPUs) is equivalent to 256 Altix CPUs (1/4 of NCSA's current machine).
These are the current performance results for NAMD with the ApoA1 system and a 12A cutoff and with PME multiple time stepping (the standard NAMD benchmark):
nodes processors time/step (SDSC) 256 512 30.4 ms 512 1024 18.2 ms 1024 2048 11.0 ms
Here are some figures from the Argonne National Labs machine. Same configuration as the above results.
nodes processors mode time/step (MCS/ANL) 16 16 co 826.4 ms 32 32 co 404.8 ms 64 64 co 217.2 ms 128 128 co 123.0 ms
-Chee Wai Lee, 12/13/2005 email@example.com