Re: problem with running namd on Linux cluster

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Tue Jan 20 2004 - 11:18:20 CST

Hi,

 I just Googled the erro message :
VMI/GM: send completed with send error 0x00000012
 here it is what it says:

What does VMI/GM: send completed with send error 0x00000012 mean?

      VMI/GM: send completed with send error 0x00000012 means there is a
problem with the Myrinet network on one or more nodes. If this error
occurs, please send email to consult_at_ncsa.uiuc.edu and send the standard
error and output files from the batch job that failed.

so you should sent email to consult_at_ncsa.uiuc.edu and report this error.

Gengbin

On Mon, 19 Jan 2004, zhilei chen wrote:

> Hi,
>
> I have some problems with namd 2.5 running on Linux cluster. I am not
> sure whether it is the software problem or the computer problem. I ran
> the job on "titan.ncsa.uiuc.edu". It worked before, with 8 nodes (16
> prcessors), but failed with 16 nodes (32 processors). The job was
> running for 24 hours ( the time limit) and I didn't get any output file,
> except the error report log file.
> Here is part of the error log file I obtained. It seems that namd
> started properly, but somehow get stucked after it started.
> I don't know what to do now. Any suggestions?
>
> Thanks
>
> Zhilei
>
>
> -----------------------------
> log file:
>
> ...................
> Info: 666 IMPROPERS
> Info: 0 EXCLUSIONS
> Info: 40113 RIGID BONDS
> Info: 86157 DEGREES OF FREEDOM
> Info: 14670 HYDROGEN GROUPS
> Info: TOTAL MASS = 256975 amu
> Info: TOTAL CHARGE = 2.89269e-06 e
> Info: *****************************
> Info: Entering startup phase 0 with 98896 kB of memory in use.
> Info: Entering startup phase 1 with 98896 kB of memory in use.
> Info: Entering startup phase 2 with 101144 kB of memory in use.
> Info: Entering startup phase 3 with 101480 kB of memory in use.
> Info: PATCH GRID IS 4 (PERIODIC) BY 5 (PERIODIC) BY 3 (PERIODIC)
> Info: REMOVING COM VELOCITY -0.0626917 -0.0199185 -0.0476978
> Info: LARGEST PATCH (25) HAS 745 ATOMS
> VMI/GM: (22) send completed with send error 0x00000012
> VMI/GM: (23) send completed with send error 0x00000012
> VMI/GM: (31) send completed with send error 0x00000012
> VMI/GM: (30) send completed with send error 0x00000012
> VMI/GM: (21) send completed with send error 0x00000012
> VMI/GM: (20) send completed with send error 0x00000012
> VMI/GM: (25) send completed with send error 0x00000012
> VMI/GM: (24) send completed with send error 0x00000012
> VMI/GM: (28) send completed with send error 0x00000012
> VMI/GM: (29) send completed with send error 0x00000012
> VMI/GM: (26) send completed with send error 0x00000012
> VMI/GM: (27) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI/GM: (15) send completed with send error 0x00000012
> VMI/GM: (14) send completed with send error 0x00000012
> VMI: Termination request on node 28. Requesting clean shutdown.
> VMI: Termination request from node 28. Node 31 cleanly shutting down.
> VMI: Sending shutdown msg ...Done.
> VMI: Termination handler executed. Node 28 going down with sigcode 5
> VMI: Sending shutdown msg to our lower peer.
> VMI: Termination request on node 28. Requesting clean shutdown.
> VMI: Sending shutdown msg to our lower peer.
> VMI: Termination request from node 28. Node 25 cleanly shutting down.
> VMI: Sending shutdown msg ...Done.
> VMI: Termination handler executed. Node 28 going down with sigcode 5
> VMI: Termination request from node 28. Node 15 cleanly shutting down.
> VMI: Sending shutdown msg ...Done.
> VMI: Termination request from node 28. Node 18 cleanly shutting down.
> VMI: Sending shutdown msg ...Done.
> VMI: Termination handler executed. Node 28 going down with sigcode 5
> VMI: Termination handler executed. Node 28 going down with sigcode 5
> VMI: Termination request from node 28. Node 26 cleanly shutting down.
> .........
>
>

Gengbin Zheng
============
(217)244-3667 (o)
gzheng_at_uiuc.edu
zhenggb_at_acm.org
http://www.ks.uiuc.edu/~gzheng/personal.html

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:22 CST