From: Leandro Martínez (leandromartinez98_at_gmail.com)
Date: Fri Aug 25 2006 - 08:01:53 CDT
Just for claryfing the problem a little bit more.
Now I put the simulation to run on a single node (the
master machine), which has two processors. It starts
running fine, two jobs each one in one processor and
using almost all cpu speed, as expected,
but eventually it returned the message:
Info: Adjusted background load on 1 nodes.
And the simulation starts running on only one processor.
Any clue on what may be going wrong?
On 8/25/06, Leandro Martínez <leandromartinez98_at_gmail.com> wrote:
> Hi all,
> I'm running a simulation with NAMD_2.6b2_Linux-amd64-TCP on
> a cluster of nine Athlon64 nodes (each processor has a dual
> core, so there are actually 18 processors). I'm having some
> strange problems with simulations I have already ran on several
> other machines, and I'm not being able to find a solution.
> Basically I start running the simulation and eventually it either
> stops without printing any error message or it eventually starts running
> on only one processor apparently. The only message I have
> observed to be different from our previous runs is this one:
> Info: Adjusted background load on 11 nodes.
> That is printed the first time load balancing is performed. The
> error does not occur necessarily after that, on the other hand,
> but that may be part of the problem, since the simulation was
> set to be running on 18 processors (9 nodes).
> The only time I got an error message it was the one below, as you
> may note was printed after a quite long simulation time.
> The error is not easily reproducible, since it happens always
> but not every time at the same point of the simulation.
> Any help or idea will be appreciated.
> ENERGY: 644800 804.7671 2363.3700 1332.0255
> 131.9843 -201929.9812 17508.6136 0.0000 0.0000
> 32575.8361 -147213.3846 297.3932 -147116.7637 -
> 147117.4476 296.8970
> Stack Traceback:
>  /lib64/libc.so.6 [0x360b32f7c0]
>  _ZN17ComputeHomeTuplesI8BondElem4bond9BondValueE6doWorkEv+0x5c4
>  _ZN11WorkDistrib12enqueueBondsEP12LocalWorkMsg+0x16 [0x727b16]
>  CkDeliverMessageFree+0x21 [0x785aab]
>  _Z15_processHandlerPvP11CkCoreState+0x455 [0x7850b5]
>  CsdScheduleForever+0xa2 [0x7f1752]
>  CsdScheduler+0x1c [0x7f1350]
>  _Z10slave_initiPPc+0x10 [0x4bb034]
>  _ZN7BackEnd4initEiPPc+0x28f [0x4bb019]
>  main+0x47 [0x4b697f]
>  __libc_start_main+0xf4 [0x360b31d084]
>  _ZNSt8ios_base4InitD1Ev+0x42 [0x4b2c9a]
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:30 CST