From: Robert McCarrick (rob.mccarrick_at_muohio.edu)
Date: Wed Apr 06 2011 - 15:50:11 CDT
Axel,
Thanks so much for the reply. That's disappointing as this was just a
$2,800 build to speed up some calculations for one of the lab's here at
Miami (this is not my area at all, I just happen to be good with Linux).
In looking at the infiniband hardware, it would be about about triple
the cost of the cluster itself.
What I will probably end up doing is writing my own little queuing
script that will take advantage of the fact that each computer is pretty
darn fast and the cluster could still run through a series of
experiments distributing the individual jobs to run on one of the
computers themselves using the 6 cores and the multicore optimized
version of NAMD. That way it will not have been a complete waste of
time money.
Rob
On Wed, 2011-04-06 at 13:06 -0400, Axel Kohlmeyer wrote:
>
>
> On Wed, Apr 6, 2011 at 12:19 PM, Robert McCarrick
> <rob.mccarrick_at_muohio.edu> wrote:
>
> Hi Everyone,
> Just to give more information on this. If I use the following
> command (with the TCP optimized version of NAMD for x86_64
> Linux):
>
> ./charmrun namd2 +p6 <configuration_file>
>
> I get a time of 0.0290901 s/step and I get 6 processes running
> on the main computer with a system load of 1.32. If I use the
> following command:
>
> ./charmrun namd2 +p24 <configuration_file>
>
> I get a time of 0.0497453 s/step and I get 6 processes on each
> of the four computers, but the system load on the main
> computer on which I executed the command has a load of 0.53
> and each of the other three computers have loads of about
> 0.01, indicating that they aren't really doing much of
> anything even though they have 6 namd processes running. I
> have a nodelist file and all of the computers can SSH to each
> other without a password. The directory in which the NAMD and
> configuration files are contained is mirrored on the other
> three computers via NFS (all of the user UIDs and GIDs and
> permissions are carrying over fine). I've been searching
> online and haven't found any way to address this. As
> mentioned in the previous email, I also compiled the
> mpi-Linux-x86_64 version and it doesn't seem to help the
> problem. Any help would be greatly appreciated.
>
>
>
>
> rob,
>
>
> TCP/IP networking doesn't give you great scaling, because of the high
> latencies.
> classical MD is quite sensitive to that, since you need to communicate
> multiple
> times in each time step and the computing effort for each step is
> rather small.
>
>
> now NAMD can do _some_ latency hiding, and thus does much better over
> TCP/IP than most other codes that i know. nevertheless, with 6 cores
> per node,
> you are really pushing the limit. you may benefit from the multi-core
> version
> that is now provided with version 2.8b1, as that will limit the
> communication
> to one task (instead of 6 tasks fighting for access to the network).
>
>
> if you really want good performance, you need to consider buying a
> fast
> low-latency interconnect. there are several of them with different
> properties
> and costs associated. the most popular currently seems to be
> infiniband,
> which seems to be a good match. i am seeing very good scaling behavior
> of NAMD (or rather charm++) using the IBVERBS library interface.
>
> cheers,
> axel.
>
>
>
> Thanks,
> Rob
>
>
>
> On Tue, 2011-04-05 at 14:51 -0400, Robert McCarrick wrote:
>
> > Hi Everyone,
> > I'm new to the cluster computer world. I've built a
> > four-computer cluster, each with a 6-core AMD Phenom
> > processor running Ubuntu Server 10.10 64 bit. I've tried
> > both the TCP optimized version on NAMD and compiling from
> > scratch with the mpi-Linux-X86_64 build of Charm. In all
> > cases, I'm getting about a 4-fold reduction in calculation
> > times when I run the job utilizing all four computers (i.e.
> > going from +p6 to +p24 causes a big slowdown). This seems
> > odd and I was wondering if anyone had any suggestions as to
> > where I might have gone wrong.
> > Rob
> > --
> > Robert M. McCarrick, Ph.D.
> > EPR Instrumentation Specialist
> > Department of Chemistry and Biochemistry
> > Miami University
> > 701 E. High Street
> > 101 Hughes Laboratories
> > Oxford, OH 45056
> > 513.529.0507 CW Room
> > 513.529.6829 Pulse Room
> > 513.529.5715 fax
> > rob.mccarrick_at_muohio.edu
> > http://epr.muohio.edu
>
>
> --
> Robert M. McCarrick, Ph.D.
> EPR Instrumentation Specialist
> Department of Chemistry and Biochemistry
> Miami University
> 701 E. High Street
> 101 Hughes Laboratories
> Oxford, OH 45056
> 513.529.0507 CW Room
> 513.529.6829 Pulse Room
> 513.529.5715 fax
> rob.mccarrick_at_muohio.edu
> http://epr.muohio.edu
>
>
>
>
> --
> Dr. Axel Kohlmeyer
> akohlmey_at_gmail.com http://goo.gl/1wk0
>
> Institute for Computational Molecular Science
> Temple University, Philadelphia PA, USA.
-- Robert M. McCarrick, Ph.D. EPR Instrumentation Specialist Department of Chemistry and Biochemistry Miami University 701 E. High Street 101 Hughes Laboratories Oxford, OH 45056 513.529.0507 CW Room 513.529.6829 Pulse Room 513.529.5715 fax rob.mccarrick_at_muohio.edu http://epr.muohio.edu
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:05 CST