Re: parallel scaling issues

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Thu Jul 22 2010 - 13:15:04 CDT

Googling for "poll: protocol failure in circuit setup" suggests that it is
some kind of firewall issue with rsh. Try charmrun ++local +p12 namd2 to
run 12 cores on one node to confirm this. Also try ++verbose to see the
startup process in more detail.

What is the network between your two nodes? If it's not running at
gigabit speeds you're probably saturating the connection. If this is the
case hen 8 cores on two nodes would run significantly slower than 8 cores
on one node

-Jim

On Wed, 21 Jul 2010, Hyonseok Hwang wrote:

> Dear NAMD users,
>
> I set up 6-core Xeon linux cluster (Xeon X5670), which means that each node
> has 12 cores. I just downloaded the "NAMD 2.7b3 Linux-x86_64-TCP" binary code
> from NAMD website and was running it in the cluster under CentOS 5.3. Now I
> have two issues.
>
> The first issue is that when I try to use 12 cores in one node, one error
> message saying "poll protocol failure in circuit setup" appears, and namd2
> stops running. Of course, if I use 8 cores, there is no problem, but if I use
> more than 8 cores, I have the same error message. I'm wondering if the
> maximum number of cores in one node is limited to 8 in NAMD 2.7b3.
> The command i issue is "/opt/local/namd27b3/bin/charmrun namd2 ++node.list
> temp.nodelist +p12 configuration file".
>
> The second issue is that when I use 8 cores in one node and look at the %CPU
> using "top" command, I can see almost 99% CPU usage in 8 cores. However, when
> I use 16 cores in two nodes and look at the same thing, I can see only 50%
> CPU usage in 16 cores. As a result, both calculations are running at the same
> speed. I don't know why. FYI, the total number of atoms in my system is
> 40176, and the number of patches are 2 x 2 x 4. I'm running the simulation
> with the PME dimension of 64x64x120.
> I use "rsh" for communication and the command I issue is the same above.
>
> Thank you in advance.
>
> Hyon
>
>
> --
> =============================================
> Hyonseok Hwang
> Assistant Professor
> Department of Chemistry
> Kangwon National University
> 192-1 Hyoja-2-dong
> Chuncheon, Gangwon-do 200-701, Rep. of Korea
> ---------------------------------------------
> Office:Room 313, Natural Sciences Building #1
> Tel:+82-033-250-8497 (office)
> Fax:+82-033-253-7582
> Email:hhwang_at_kangwon.ac.kr
> =============================================
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:20 CST