Re: Using nodelist file causes namd to hang

From: Douglas Houston (DouglasR.Houston_at_ed.ac.uk)
Date: Sat Sep 20 2014 - 06:14:29 CDT

Hi Norman,

What I'm trying to get at is: Is there any point in running NAMD over
an ethernet-linked cluster?

I can't find anything in the NAMD documentation regarding minimum
recommended switch speeds.

Can you tell me the model of the ethernet switch used in the "hundreds
of nodes" cluster you cite as an example please? Or indeed the model
of any switch used in any ethernet-linked cluster that successfully
speeds up NAMD.

cheers,
Doug

Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Thu, 18 Sep
2014 19:32:23 +0200:

>
> This should just show if the two nodes connected directy are faster
> than with using the switch, so pointing out if your switch might be
> too slow ;) Of course namd can run over hundrets of nodes, but gigabit
> is limited...
>
> Am Donnerstag, den 18-09-2014 um 12:35 schrieb Douglas Houston:
>
>
> Hi Norman,
>
> We have a 802.3ab 1000BASE-T Gigabit Ethernet switch (Netgear
> GS205)  
> that is dedicated to connecting the nodes of this cluster only (so
> no  
> additional traffic). Assuming removing this switch will allow two  
> nodes to show a speedup relative to one node, would this really  
> represent a solution?
>
> It was my understanding that NAMD is at least somewhat
> parallelisable  
> across an ordinary ethernet-linked cluster of nodes. Is this not
> the  
> case?
>
> Can anyone on this mailing list tell me if they have successfully  
> noted a speed increase by running NAMD across an ethernet-linked  
> cluster vs. a single node? If so, could they please list their
> network  
> hardware, e.g. ethernet cards, switch, number of nodes, cores per  
> node, size of system benchmarked, etc.
>
> cheers,
> Doug
>
> P.S. 64 minutes of simulation runtime across 2 of the nodes results
> in  
> a total of 390GiB of data transferred between them (according to  
> ifconfig) - this equates to about 100 MiB/sec. This is for a  
> 80,000-atom system. For my small 5,000 atom system it shows about
> 60  
> MiB/sec. Does this mean that, for the large system at least, the  
> bandwidth could indeed be saturating (100 MiB/sec being not far
> off  
> 1Gbit/sec)? If this is the case, it is not clear to me why the
> data  
> transfer rate is so much and if anything can actually be done about
> it.
>
>
>
>
>
> Quoting Norman Geist on Wed, 10 Sep  
> 2014 15:14:37 +0200:
>
>> Benchmark the timing over two nodes, without the switch. Sometimes
> the
>> switches are very slow, especially if other ports are active at the
> same
>> time.
>>
>> Norman Geist.
>>
>
>
>
> _____________________________________________________
> Dr. Douglas R. Houston
> Lecturer
> Institute of Structural and Molecular Biology
> Room 3.23, Michael Swann Building
> King's Buildings
> University of Edinburgh
> Edinburgh, EH9 3JR, UK
> Tel. 0131 650 7358
> http://tinyurl.com/douglasrhouston
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>

_____________________________________________________
Dr. Douglas R. Houston
Lecturer
Institute of Structural and Molecular Biology
Room 3.23, Michael Swann Building
King's Buildings
University of Edinburgh
Edinburgh, EH9 3JR, UK
Tel. 0131 650 7358
http://tinyurl.com/douglasrhouston

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:51 CST