Re: Running namd on multiple win32 nodes

From: Ajasja Ljubetič (ajasja.ljubetic_at_gmail.com)
Date: Wed Jul 28 2010 - 07:18:16 CDT

This is the output with the ++verbrose option. The error at the end happens
when I manually terminate the charmd and namd2 process on the nodes.

Charmrun> charmrun started...
Charmrun> using nodelist as nodesfile
Charmrun> adding client 0: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 1: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 2: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 3: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 4: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 5: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 6: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 7: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 8: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 9: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 10: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 11: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 12: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 13: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> adding client 14: "EPR-ROKAVI3", IP:194.249.230.39
Charmrun> adding client 15: "EPR-ROKAVI4", IP:194.249.230.40
Charmrun> Charmrun = 194.249.230.64, port = 59782
Charmrun> packing arg: L35_WAT_min_fix1.conf
Charmrun> Starting node program 0 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 0 started.
Charmrun> Starting node program 1 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 1 started.
Charmrun> Starting node program 2 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 2 started.
Charmrun> Starting node program 3 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 3 started.
Charmrun> Starting node program 4 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 4 started.
Charmrun> Starting node program 5 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 5 started.
Charmrun> Starting node program 6 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 6 started.
Charmrun> Starting node program 7 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 7 started.
Charmrun> Starting node program 8 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 8 started.
Charmrun> Starting node program 9 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 9 started.
Charmrun> Starting node program 10 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 10 started.
Charmrun> Starting node program 11 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 11 started.
Charmrun> Starting node program 12 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 12 started.
Charmrun> Starting node program 13 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 13 started.
Charmrun> Starting node program 14 on 'EPR-ROKAVI3' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 14 started.
Charmrun> Starting node program 15 on 'EPR-ROKAVI4' as
C:\NAMD2.73b\namd2.exe.
Charmrun> Node program 15 started.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> Waiting for 2-th client to connect.
Charmrun> Waiting for 3-th client to connect.
Charmrun> Waiting for 4-th client to connect.
Charmrun> client 0 connected (IP=194.249.230.39 data_port=1201)
Charmrun> client 1 connected (IP=194.249.230.40 data_port=2941)
Charmrun> client 2 connected (IP=194.249.230.39 data_port=1203)
Charmrun> client 3 connected (IP=194.249.230.40 data_port=2943)
Charmrun> client 4 connected (IP=194.249.230.39 data_port=1205)
Charmrun> Waiting for 5-th client to connect.
Charmrun> client 15 connected (IP=194.249.230.40 data_port=2955)
Charmrun> Waiting for 6-th client to connect.
Charmrun> client 5 connected (IP=194.249.230.40 data_port=2945)
Charmrun> Waiting for 7-th client to connect.
Charmrun> client 6 connected (IP=194.249.230.39 data_port=1207)
Charmrun> Waiting for 8-th client to connect.
Charmrun> client 7 connected (IP=194.249.230.40 data_port=2947)
Charmrun> Waiting for 9-th client to connect.
Charmrun> client 8 connected (IP=194.249.230.39 data_port=1209)
Charmrun> Waiting for 10-th client to connect.
Charmrun> client 9 connected (IP=194.249.230.40 data_port=2949)
Charmrun> Waiting for 11-th client to connect.
Charmrun> client 10 connected (IP=194.249.230.39 data_port=1211)
Charmrun> Waiting for 12-th client to connect.
Charmrun> client 11 connected (IP=194.249.230.40 data_port=2951)
Charmrun> Waiting for 13-th client to connect.
Charmrun> client 12 connected (IP=194.249.230.39 data_port=1213)
Charmrun> Waiting for 14-th client to connect.
Charmrun> client 13 connected (IP=194.249.230.40 data_port=2953)
Charmrun> Waiting for 15-th client to connect.
Charmrun> client 14 connected (IP=194.249.230.39 data_port=1215)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
CmiMemory: fences and atomic operations not available in native assembly
[0] isomalloc.c> Disabling isomalloc because mmap() does not work
Charmrun: error on request socket--
Error on socket recv!

On Wed, Jul 28, 2010 at 11:42, Ajasja Ljubetič <ajasja.ljubetic_at_gmail.com>wrote:

> Dear all,
>
> I think it is a shame that the later versions of NAMD (I think after 2.5)
> are not supported on multiple windows nodes.
> Especially since I have 8 nodes each with 8 processors running 32 bit winxp
> connected using gigabit LAN:)
>
> I have tried using charmrun to run a simulation on many nodes. I downloaded
> the charm-6.2.1_net-win32_production binaries and ran the charmd on two
> nodes.
> Using
>
> ..\charmrun.exe C:\NAMD2.73b\namd2.exe L35_WAT_min_fix1.conf ++nodelist
> nodelist ++p 16 > test.log
>
> I am able to get charmrun to run 8 copies of namd2.exe (by copying the namd
> and input files to the same path on both nodes)
> charmd reports something like
>
> Listening for requests on port 12396
> Connection from IP 194.249.230.64, port 49174 at Wed Jul 28 11:35:57 2010
> Invoking 'C:\NAMD2.73b\namd2.exe'
> and argLine ' L35_WAT_min_fix1.conf'
> and environment 'NETSTART=14 194.249.230.64 65532 1488 0'
> in 'C:\NAMD2.73b\test'
>
> The processor usage on both nodes goes to 100%, but nothing gets writen to
> the output. The log file test.log contains
>
> CmiMemory: fences and atomic operations not available in native assembly
> [0] isomalloc.c> Disabling isomalloc because mmap() does not work
>
> Does anyone know where the problem is? Do I need to compile NAMD myself to
> support charmrun? (and will it suffice to change
> CHARMARCH = multicore-win32 to CHARMARCH = net-win32 in the make files?)
> And is there a good reason that NAMD on multiple windows nodes
> not supported any more?
>
> I also have sshd installed on each node. Is it better to use charmd or sshd
> or mpich2 (maybe it makes no difference on an internal network)?
>
> Any help is greatly appreciated.
> Best regards,
> Ajasja
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:59 CST