From: Michael Grabe (mgrabe_at_itsa.ucsf.edu)
Date: Mon Mar 07 2005 - 15:30:48 CST
Gengbin,
You were correct. I turned off the FIREWALL on my master node
and everything worked fine. It works with
the slave nodes still having their FIREWALLS turned on, with the
same configuration as the master node had (when I didn't work).
I have two questions:
1) When I use top to check namd2 on the nodes, it seems that it
is running with a low priority and only using between 68-90% of the
CPU. How can I change my command line so that it spawns all of
the processes with high priority?
2) I really want to get my firewall up and running on the master node
while still running NAMD2 in parallel. so what ports should i turn on
so that communication will be fine? is this written down anywhere?
Thanks for all of your help,
michael
G5 XServe cluster running OSX 10.3.8
On Mar 7, 2005, at 10:52 AM, Gengbin Zheng wrote:
>
> Show me the full output with ++verbose. I suspect your charmrun ip was
> resolved to "localhost" and not its real ip, and using localhost,
> compute nodes can not connect to charmrun process with socket.
>
> Gengbin
>
> Michael Grabe wrote:
>
>> Gengbin,
>>
>> I can get namd2 to run on each node separately, and using ++local, i
>> can even get
>> chramrun to use both processors on each node. (both of these binaries
>> are from
>> the NAMD MAC OSX tar file)
>>
>> -michael
>>
>>
>> On Mar 7, 2005, at 9:17 AM, Gengbin Zheng wrote:
>>
>>
>> Michael,
>>
>> If you are using precompiled NAMD binaries, first make sure it
>> runs on one processor, as running it sequentially normally would
>> give you more sensible errors. So just run it in standalone mode
>> like on the G5:
>> ./namd2 src/alanin
>>
>> It may not run even in sequential (due to various reasons,
>> incompatible libraries, missing shared libs, etc), thus in
>> parallel case, charmrun could not connect to the node processes
>> and timed out.
>>
>> Gengbin
>>
>>
>> Filippo Gioachin wrote:
>>
>> maybe someone can give him an answer...
>>
>> ---------- Forwarded message ----------
>> Date: Sun, 6 Mar 2005 15:07:07 -0800
>> From: Michael Grabe <mgrabe_at_itsa.ucsf.edu>
>> To: Filippo Gioachin <gioachin_at_uiuc.edu>
>> Subject: Re: question
>>
>> Filippo,
>>
>> I am running a cluster of G5 Xserves running OSX 10.3.8, and I
>> have an Asante Gigaswitch between them.
>>
>> I am a NAMD person so I am using the precompiled charmrun
>> that comes along with the NAMD_2.5 binaries. I have used this
>> on the computers at pittsburgh super computing center, but
>> never
>> on my own cluster that I just set up. I get this error when I
>> try to
>> start a job on 2 nodes using three processors (i have
>> 2p/node):
>>
>> Charmrun> charmrun started...
>> Charmrun> using ./nodelist as nodesfile
>> Charmrun> rsh (profplum:0d) started
>> Charmrun> rsh (mrswhite:1d) started
>> Charmrun> rsh (profplum:2d) started
>> Charmrun> node programs all started
>> Charmrun> error 0 attaching to node:
>> Timeout waiting for node-program to connect
>>
>> I can ssh into each node without a password (i set
>> CONV_RSH=ssh), and
>> all of the nodes have access to the same binaries.
>>
>> I should really direct this to the NAMD mailing list, but I
>> thought I might
>> just ask since i think I am having charmrun problems.
>>
>> Thanks for any help you can give me.
>>
>> -Michael
>>
>>
>>
>> On Mar 6, 2005, at 2:27 PM, Filippo Gioachin wrote:
>>
>>
>>
>> Michael,
>>
>> all the precompiled versions have been tested in an
>> architecture before
>> being gzipped. if you get a binary which fit your system,
>> you should have
>> no problems, and be able to compile your programs and run
>> them, or
>> directly run the precompiled binaries of the
>> examples/tests (these are
>> present under the net-linux/examples directory if you
>> downloaded the
>> net-linux version) without any parallel library.
>>
>> you didn't specify which binary you downloaded, so I'm not
>> sure this is
>> completely true for what you got. if you have net-
>> versions they do not
>> use any particular parallel library and should just work.
>>
>> from your question I wasn't able to understand if you
>> already tried and
>> failed. if this is the case, please let me know which
>> binary you
>> downloaded, and what is the error you get.
>>
>> thanks,
>> Filippo
>>
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> --
>> --------------------------
>> Michael Grabe, Ph.D.
>> Post-doctoral Fellow
>> Howard Hughes Medical Institute
>> University of California, San Francisco
>> 1550 4th Street GD482
>> San Francisco, CA 94143
>> mgrabe_at_itsa.ucsf.edu
>> tel: ++ 415.476.4021
>> http://itsa.ucsf.edu/~mgrabe
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> ----------------------------
>> Michael Grabe, Ph.D.
>> Post-doctoral Fellow
>> Howard Hughes Medical Institute
>> University of California, San Francisco
>> 1550 4th Street GD482
>> San Francisco, CA 94143
>> mgrabe_at_itsa.ucsf.edu
>> tel: ++ 415.476.4021
>> http://itsa.ucsf.edu/~mgrabe
>>
>>
>
>
>
------------------------------------------------------------------------
--------------------------
Michael Grabe, Ph.D.
HHMI/UCSF
Genetics Development & Behavioral Science Building
1550 4th Street, GD 481
San Francisco, CA 94143-0725
tel: ++ 415.476.0421
http://itsa.ucsf.edu/~mgrabe
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:33 CST