BUG Report: NAMD 2.10 Nightly Build from 11.Sep

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Oct 11 2013 - 06:23:14 CDT

Hi,

 

I'd like to report the following behavior of the mentioned NAMD version.
NAMD REMD simulations seem to segfault if the least required two processes
per replica are split up over different nodes. This happens for example if a
queuing system uses machine files which contain a node name only once, as
the MPI will usually start the processes round robin until reached its
maximum count.

 

Example machinefile:

 

C1

C2

C3

 

Example distribution:

#P node replica

1 C01 0

2 C02 0

3 C03 1

4 C01 1

5 C02 2

6 C03 2

 

The above case always segfaults for me, whereas the following work
perfectly:

 

Example machinefile:

 

C1

C1

C2

C2

C3

C3

 

Example distribution:

#P node replica

1 C01 0

2 C01 0

3 C02 1

4 C02 1

5 C03 2

6 C03 2

 

I guess segfaulting isn't the expected behavior, so if the current
implementation requires the explained behavior, it might be worth to
precheck the distribution, or to mention it in the manual.

 

Also, if one starts a REMD having num procs equal to num replicas, the
simulation will just do nothing, but won't break, which is also
uncomfortable.

 

Regards

 

Norman Geist

 

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:45 CST