Re: load balancer, athlon 64 dual core

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Wed Sep 13 2006 - 15:33:49 CDT

It's definitely getting stuck in some kind of load balancer loop. Is
there any output just before the hang? You should at least get one LDB
line that might be useful.

-Jim

On Wed, 13 Sep 2006, Leandro Martínez wrote:

> Hi all,
> I'm still having problems in running namd2 in our Athlon 64 Dual Core
> machines. The problem is that the simulation runs well to a point where
> all processes, except for one, stop, and I get a single process in a
> single cpu running. The simulation does not crash, but it does not
> continues as well, and this single process appears to last forever
> doing something I don't know what it is.
>
> Now, as Jim suggested, I have attached gdb to this process. I have
> never used it, but I could get the information bellow. Any help is
> appreciated. I believe the bolded output bellow is the one referring
> to the namd2 process.
>
> ------------ OUTPUT FROM GDB: --------------------------
>
> Attaching to program: /usr/bin/namd2, process 19438
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libstdc++.so.6
> Reading symbols from /lib64/libc.so.6...
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> Reading symbols from /lib64/libnss_files.so.2...
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x0000000000714caa in Set::find ()
> (gdb) next
> Single stepping until exit from function _ZN3Set4findEP10InfoRecord,
> which has no line number information.
> 0x00000000006f407f in Rebalancer::numAvailable ()
> (gdb) next
> Single stepping until exit from function
> _ZN10Rebalancer12numAvailableEP11computeInfoP13processorInfoPiS4_S4_,
> which has no line number information.
> 0x00000000006f3f34 in Rebalancer::refine_togrid ()
> (gdb) next
> Single stepping until exit from function
> _ZN10Rebalancer13refine_togridERA3_A3_A2_NS_6pcpairEdP13processorInfoP11computeInfo,
> which has no line number information.
> 0x00000000006f23b5 in Rebalancer::refine ()
> (gdb) next
> Single stepping until exit from function _ZN10Rebalancer6refineEv,
> which has no line number information.
> -----------------------------------------------------------------------
>> From this point on nothing happens.
>
> Thank you very much,
> Leandro.
>
>
>
>
> --------------------------------------------------------------------
> Leandro Martinez
> Institute of Chemistry
> State University of Campinas, Brazil
> http://www.ime.unicamp.br/~martinez/packmol
> --------------------------------------------------------------------
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:35 CST