Re: Re: Is clock skew a problem for charm++

From: Jan Saam (saam_at_charite.de)
Date: Wed Jun 21 2006 - 08:52:54 CDT

Hi Gengbin and others,

Yes, the execution of the program did really take 7 min wallclock time.
I asked the sysadmin to synchronize the clocks. This lead to a
tremendous speedup, but running on several nodes is still slower than on
a single node:
Now 'queens' finishes in 6s on 2 nodes (as opposed to 445s before
synchronization), on one node it returns after 2s. Using up to 10 nodes
decreases the time to 3s, which signified real bad scaling:

[jan_at_BPU3 queens]$ mpirun -v -np 1 -machinefile ~/machines ./pgm 12 6
running
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/./pgm
on 1 LINUX ch_p4 processors
Created
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/PI13613
There are 14200 Solutions to 12 queens. Finish time=1.978130
End of program
[jan_at_BPU3 queens]$ vi ~/machines
[jan_at_BPU3 queens]$ mpirun -v -np 2 -machinefile ~/machines ./pgm 12 6
running
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/./pgm
on 2 LINUX ch_p4 processors
Created
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/PI13673
There are 14200 Solutions to 12 queens. Finish time=6.134362
End of program
[jan_at_BPU3 queens]$ mpirun -v -np 10 -machinefile ~/machines ./pgm 12 6
running
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/./pgm
on 10 LINUX ch_p4 processors
Created
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/PI14141
There are 14200 Solutions to 12 queens. Finish time=3.095719
End of program

in order to test network latency I ran the pingpong test. Below are the
results. Do these roundtrip times mean that the network latency is bad?

[jan_at_BPU3 pingpong]$ mpirun -v -np 1 -machinefile ~/machines ./pgm
running
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/tests/charm++/pingpong/./pgm
on 1 LINUX ch_p4 processors
Created
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/tests/charm++/pingpong/PI14581
Roundtrip time for 1D Arrays is 6.625700 us
Roundtrip time for 2D Arrays is 6.974200 us
Roundtrip time for 3D Arrays is 7.126500 us
Roundtrip time for Fancy Arrays is 7.569600 us
Roundtrip time for Chares (reuse msgs) is 2.916400 us
Roundtrip time for Chares (new/del msgs) is 4.059300 us
Roundtrip time for threaded Chares is 16.221500 us
Roundtrip time for Groups is 3.260800 us
End of program
[jan_at_BPU3 pingpong]$ mpirun -v -np 2 -machinefile ~/machines ./pgm
running
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/tests/charm++/pingpong/./pgm
on 2 LINUX ch_p4 processors
Created
/home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/tests/charm++/pingpong/PI14640
Roundtrip time for 1D Arrays is 235.032400 us
Roundtrip time for 2D Arrays is 230.769500 us
Roundtrip time for 3D Arrays is 229.962500 us
Roundtrip time for Fancy Arrays is 189.919500 us
Roundtrip time for Chares (reuse msgs) is 181.993800 us
Roundtrip time for Chares (new/del msgs) is 245.631200 us
Roundtrip time for threaded Chares is 206.101900 us
Roundtrip time for Groups is 182.176900 us
End of program

Thanks,
Jan

Gengbin Zheng wrote:
>
> Hi Jan,
>
> Clock skew may cause misleading time output, but I doubt it is the
> case here (queens program) because the time was printed from the same
> processor (0).
> When you run the program, did it really take 7 minutes wallclock time?
> Also, have you tried pingpong test from charm/tests/charm++/pingpong
> to test network latency?
>
> Gengbin
>
> Jan Saam wrote:
>
>> I forgot to say that I checked already that the problem is not ssh
>> taking forever to make a connection.
>> This is at least proven by this simple test:
>> time ssh BPU5 pwd
>> /home/jan
>>
>> real 0m0.236s
>> user 0m0.050s
>> sys 0m0.000s
>>
>> Jan
>>
>>
>> Jan Saam wrote:
>>
>>
>>> Hi all,
>>>
>>> I'm experiencing some weird performance problems with NAMD or the
>>> charm++ library on a linux cluster:
>>> When I'm using NAMD or a simple charmm++ demo program on one node
>>> everything is fine, but when I use more that one node each step takes
>>> _very_ much longer!
>>>
>>> Example:
>>> 2s for the program queens on 1 node, 445s on 2 nodes!!!
>>>
>>> running
>>> /home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/./pgm
>>>
>>> on 1 LINUX ch_p4 processors
>>> Created
>>> /home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/PI28357
>>>
>>> There are 14200 Solutions to 12 queens. Finish time=1.947209
>>> End of program
>>> [jan_at_BPU1 queens]$ mpirun -v -np 2 -machinefile ~/machines ./pgm 12 6
>>> running
>>> /home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/./pgm
>>>
>>> on 2 LINUX ch_p4 processors
>>> Created
>>> /home/jan/NAMD_2.6b1_Source/charm-5.9/mpi-linux-gcc/examples/charm++/queens/PI28413
>>>
>>> There are 14200 Solutions to 12 queens. Finish time=445.547998
>>> End of program
>>>
>>> The same is true when I'm building the net-linux versions instead of
>>> mpi-linux, thus the problem is probably independent of MPI.
>>>
>>> One thing I noticed is that there is a several minute clock skew
>>> between
>>> the nodes. Could that be part of my problem (unfortnately I don't have
>>> rights to simply synchronize the clocks)?
>>>
>>> Does anyone have an idea what the problem could be?
>>>
>>> Many thanks,
>>> Jan
>>>
>>>
>>>
>>
>>
>>
>

-- 
---------------------------
Jan Saam
Institute of Biochemistry
Charite Berlin
Monbijoustr. 2
10117 Berlin
Germany
+49 30 450-528-446
saam_at_charite.de

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:14 CST