From: Kwee Hong (joyssstan0202_at_gmail.com)
Date: Wed Oct 27 2010 - 12:08:11 CDT
*Hi *all,
I had my simulation run on a 14 nodes cluster and I got this error msg:
*WRITING COORDINATES TO DCD FILE AT STEP 1605500
WRITING COORDINATES TO RESTART FILE AT STEP 1605500
ERROR: Error on renaming file 2mrt_md_extend.restart.coor to
2mrt_md_extend.restart.coor.old: Invalid cross-device link
FATAL ERROR: Unable to open binary file 2mrt_md_extend.restart.coor: File
exists
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: Unable to open binary file 2mrt_md_extend.restart.coor:
File exists*
And after posting the error at the mailing list and we got it solved as it
is due to file's permission. After some time, another similar error occur
with an exra notes:
*
*
*ERROR: Error on renaming file ZN_wb_md.restart.coor to
ZN_wb_md.restart.coor.old: Invalid cross-device link*
*FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor: File exists*
*------------- Processor 0 Exiting: Called CmiAbort ------------*
*Reason: FATAL ERROR: Unable to open binary file ZN_wb_md.restart.coor: File
exists*
*
*
*[0] Stack Traceback:*
*  [0:0] CmiAbort+0x5c  [0xb4521c]*
*  [0:1] _Z8NAMD_errPKc+0x9d  [0x520c99]*
*  [0:2] _ZN6Output17write_binary_fileEPciP6Vector+0x17e  [0x98619e]*
*  [0:3] _ZN6Output26output_restart_coordinatesEP6Vectorii+0x1b5  [0x986003]
*
*  [0:4] _ZN6Output10coordinateEiiP6VectorP11FloatVectorR7Lattice+0x12b
 [0x985c57]*
*  [0:5]
_ZN24CkIndex_CollectionMaster39_call_receivePositions_CollectVectorMsgEPvP16CollectionMaster+0x18f
 [0x533603]*
*  [0:6] CkDeliverMessageFree+0x21  [0xa863df]*
*Charmrun: error on request socket--*
*Socket closed before recv.*
This round I doubt the problem got to do with the file's permission. We are
using nfs parallel file system on the cluster. We export the nfs
using (rw,sync,no_subtree_check,no_root_squash) options.
Anyway to tackle this?
Thanks
Regards,
Joyce
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:17 CST