Re: Charmrun: error on request socket--

From: Kwee Hong (joyssstan0202_at_gmail.com)
Date: Wed Oct 27 2010 - 21:01:33 CDT

I got a mail delivery failure from unknown sender so I'm not sure whether
this email was sent or not. Therefore I resend this email. Apologize if you
have received this post.

Regards,
Joyce

On Thu, Oct 28, 2010 at 1:08 AM, Kwee Hong <joyssstan0202_at_gmail.com> wrote:

> *Hi *all,
>
> I had my simulation run on a 14 nodes cluster and I got this error msg:
>
>
> *WRITING COORDINATES TO DCD FILE AT STEP 1605500
> WRITING COORDINATES TO RESTART FILE AT STEP 1605500
> ERROR: Error on renaming file 2mrt_md_extend.restart.coor to
> 2mrt_md_extend.restart.coor.old: Invalid cross-device link
> FATAL ERROR: Unable to open binary file 2mrt_md_extend.restart.coor: File
> exists
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: Unable to open binary file
> 2mrt_md_extend.restart.coor: File exists*
>
> And after posting the error at the mailing list and we got it solved as it
> is due to file's permission. After some time, another similar error occur
> with an exra notes:
> *
> *
> *ERROR: Error on renaming file ZN_wb_md.restart.coor to
> ZN_wb_md.restart.coor.old: Invalid cross-device link*
> *FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor: File
> exists*
> *------------- Processor 0 Exiting: Called CmiAbort ------------*
> *Reason: FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor:
> File exists*
> *
> *
> *[0] Stack Traceback:*
> * [0:0] CmiAbort+0x5c [0xb4521c]*
> * [0:1] _Z8NAMD_errPKc+0x9d [0x520c99]*
> * [0:2] _ZN6Output17write_binary_fileEPciP6Vector+0x17e [0x98619e]*
> * [0:3] _ZN6Output26output_restart_coordinatesEP6Vectorii+0x1b5
> [0x986003]*
> * [0:4] _ZN6Output10coordinateEiiP6VectorP11FloatVectorR7Lattice+0x12b
> [0x985c57]*
> * [0:5]
> _ZN24CkIndex_CollectionMaster39_call_receivePositions_CollectVectorMsgEPvP16CollectionMaster+0x18f
> [0x533603]*
> * [0:6] CkDeliverMessageFree+0x21 [0xa863df]*
> *Charmrun: error on request socket--*
> *Socket closed before recv.*
>
> This round I doubt the problem got to do with the file's permission. We are
> using nfs parallel file system on the cluster. We export the nfs
> using (rw,sync,no_subtree_check,no_root_squash) options.
>
>
> Anyway to tackle this?
>
> Thanks
>
> Regards,
> Joyce
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:17 CST