From: Douglas Houston (DouglasR.Houston_at_ed.ac.uk)
Date: Thu Apr 16 2015 - 05:10:12 CDT
I can't recall off the top of my head what we did to resolve that
particular issue (I do remember having to switch off firewalls),
although the thread continues here:
but I can tell you it was a waste of time anyway as even if you can
get it to run the speedup you see by running NAMD across multiple
nodes is nonexistent, if they are linked via a standard ethernet
network using cheap consumer-grade switches. Others have reported the
same to me.
It's probably related to latency; NAMD appears to have strict network
hardware requirements that are not actually published.
This is the case for a single simulation; things like REMD might work better.
Quoting Michael Charlton <michael.charlton_at_inhibox.com> on Thu, 16 Apr
2015 09:37:29 +0100:
> Hi Douglas,
> I have been following your problem with getting NAMD running in
> parallel on
> It seems that I have an identical problem and I cannot see a final
> resolution on this thread. Can you tell me if you managed to get it
> working and what your solution was ?
> (I am running on a Centos/Rocks cluster where my file space is
> shared between all nodes and hence they are all trying to read the
> same ssh keys which seems to cause the clash).
> Many thanks,
> Michael Charlton, InhibOx
Quoting Nicholas M Glykos <glykos_at_mbg.duth.gr> on Mon, 22 Sep 2014
11:40:59 +0300 (EEST):
>> > Thank you very much for that, it is supremely helpful. I am going to try
>> to replicate the various benchmarking tests you describe in your link. To
>> that end, I wonder if you would be able to supply me with the 60,000-atom
>> ionized.psf and heat_out.coor files you used so that my steps match yours
>> as closely as possible?
> I don't expect scaling problems to be protein-specific. We were getting
> reasonable scaling with the ApoA1 benchmark distributed by NAMD
> developers (see
> for a more recent test with NAMD 2.8 + CUDA. The measurements stop at four
> nodes because we only had four nodes with GPU's :-)
>> > If not (you hint that your tests were done a long time ago), I wonder if
>> you ever looked into total bandwidth usage in your tests? 64 minutes of
>> simulation runtime across 2 of my nodes results in a total of 390GiB of
>> data transferred between them (according to ifconfig) - this equates to
>> about 100 MiB/sec. This is for my 80,000-atom system. Does this mean that
>> my network bandwidth could indeed be saturating (100 MiB/sec being not far
>> off 1Gbit/sec)? If this is true, it is not clear to me why the data
>> transfer rate is so high in my case but not yours.
> As Axel said, latency is probably more important. Have you benchmarked
> your network with a tool like NetPIPE ? (see
> http://norma.mbg.duth.gr/index.php?id=about:benchmarks:network for an
> example, this was again back in 2009, so there may be much better tools
> around these days).
> Nicholas M. Glykos, Department of Molecular Biology
> and Genetics, Democritus University of Thrace, University Campus,
> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Dr. Douglas R. Houston
Institute of Structural and Molecular Biology
Room 3.23, Michael Swann Building
University of Edinburgh
Edinburgh, EH9 3JR, UK
Tel. 0131 650 7358
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:04 CST