next up previous contents index
Next: Shared-Memory and Network-Based Parallelism Up: Running NAMD Previous: Linux Clusters with InfiniBand   Contents   Index

Linux or Other Unix Workstation Networks

The same binaries used for individual workstations as described above (other than pure ``multicore'' builds and MPI builds) can be used with charmrun to run in parallel on a workstation network. The only difference is that you must provide a ``nodelist'' file listing the machines where namd3 processes should run, for example:

  group main
  host brutus
  host romeo

The ``group main'' line defines the default machine list. Hosts brutus and romeo are the two machines on which to run the simulation. Note that charmrun may run on one of those machines, or charmrun may run on a third machine. All machines used for a simulation must be of the same type and have access to the same namd3 binary.

By default, the ``rsh'' command is used to start namd3 on each node specified in the nodelist file. You can change this via the CONV_RSH environment variable, i.e., to use ssh instead of rsh run ``setenv CONV_RSH ssh'' or add it to your login or batch script. You must be able to connect to each node via rsh/ssh without typing your password; this can be accomplished via a .rhosts files in your home directory, by an /etc/hosts.equiv file installed by your sysadmin, or by a .ssh/authorized_keys file in your home directory. You should confirm that you can run ``ssh hostname pwd'' (or ``rsh hostname pwd'') without typing a password before running NAMD. Contact your local sysadmin if you have difficulty setting this up. If you are unable to use rsh or ssh, then add ``setenv CONV_DAEMON'' to your script and run charmd (or charmd_faceless, which produces a log file) on every node.

You should now be able to try running NAMD as:

  charmrun namd3 +p<procs> <configfile>

If this fails or just hangs, try adding the ++verbose option to see more details of the startup process. You may need to specify the full path to the namd3 binary. Charmrun will start the number of processes specified by the +p option, cycling through the hosts in the nodelist file as many times as necessary. You may list multiprocessor machines multiple times in the nodelist file, once for each processor.

You may specify the nodelist file with the ``++nodelist'' option and the group (which defaults to ``main'') with the ``++nodegroup'' option. If you do not use ``++nodelist'' charmrun will first look for ``nodelist'' in your current directory and then ``.nodelist'' in your home directory.

Some automounters use a temporary mount directory which is prepended to the path returned by the pwd command. To run on multiple machines you must add a ``++pathfix'' option to your nodelist file. For example:

  group main ++pathfix /tmp\_mnt /
  host alpha1
  host alpha2

There are many other options to charmrun and for the nodelist file. These are documented at in the Charm++ Installation and Usage Manual available at http://charm.cs.uiuc.edu/manuals/ and a list of available charmrun options is available by running charmrun without arguments.

If your workstation cluster is controlled by a queueing system you will need build a nodelist file in your job script. For example, if your queueing system provides a HOST_FILE environment variable:

  set NODES = `cat $HOST_FILE`
  set NODELIST = $TMPDIR/namd3.nodelist
  echo group main >! $NODELIST
  foreach node ( $NODES )
    echo host $node >> $NODELIST
  end
  @ NUMPROCS = 2 * $#NODES
  charmrun namd3 +p$NUMPROCS ++nodelist $NODELIST <configfile>

Note that NUMPROCS is twice the number of nodes in this example. This is the case for dual-processor machines. For single-processor machines you would not multiply $#NODES by two.

Note that these example scripts and the setenv command are for the csh or tcsh shells. They must be translated to work with sh or bash.


next up previous contents index
Next: Shared-Memory and Network-Based Parallelism Up: Running NAMD Previous: Linux Clusters with InfiniBand   Contents   Index
http://www.ks.uiuc.edu/Research/namd/