Re: linux cluster problems

From: Christophe Combet [PBIL/IBCP/CNRS] (c.combet_at_ibcp.fr)
Date: Mon Jul 17 2006 - 05:38:04 CDT

Dear M. Greisen

Running NAMD with PBS needs some works.

  1) You (your system administrator) need to define a queue to run
multi-node jobs in the PBS server (qmgr).
  2) You will find the list of nodes to use in the file at path
$PBS_NODEFILE (env variable sets by PBS)
  3) However this is quick and bad way to run namd (using charmrun adn
rsh) as PBS does not handle the jobs correctly (cputime, wwaltime,
suspending). You should install a mpi version of namd.

Hope this helps.

Regards.

Le 17 juil. 06, à 11:20, Per Jr. Greisen a écrit :

> Hey,
>
> I have submitted a job on a cluster and it is only running on one node
> eventhough I have specified 11 nodes.
>
> If I do qstat -f it is only running 0.93 on one of the nodes while the
> rest are zero.
> If I go into the test.o1929 file I get the following error message:
>
> Charmrun rsh(node45.6)> Cannot locate this node-program:
> /tmp/1929.1.all.q/machines
>
> Charmrun rsh(node45.6)> Exiting with error code 1
>
> What is wrong and how to fix it? Thanks
>
> I have been looking at the wiki-site but I cannot see a way to solve
> the
> problem, I have also tried to change the ++timeout in the job.sh but
> still
> it doesnt work
>
> --
> Best Regards
>
> Per Jr. Greisen
> +4528648657
>
>
> --
> Best Regards
>
> Per Jr. Greisen
> +4528648657
>
>

--
Dr Christophe COMBET   Tel: (+33) (0)4 37 65 29 47 Fax: (+33) (0)4 72 
72 26 04
Pôle BioInformatique Lyonnais - Lyon Gerland - http://pbil.ibcp.fr
IBCP (UMR 5086 CNRS - Université Lyon 1)  -  http://www.ibcp.fr
7, passage du Vercors - 69367 Lyon - FRANCE

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:43:50 CST