Re: Input/output error

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue May 18 2010 - 04:18:20 CDT

2010/5/18 王棽 <corarbor_at_163.com>:
> Dear NAMD users:
> I am running NAMD on Dawning5000A super computer,
> "http://www.ssc.net.cn/en/resources.asp". However, I found my NAMD processes
> vulnerable on such a platfrom. They usually died with an input/output error
> of the *.restart.coor, *.restart.vel or *.restart.xsc files. There is an
> example of stand output below:
>
[...]

> I contacted with the engineers of the super computer center, and they found
> there was a temporary lustre terminal connection break and reconnect event
> when such input/output error happened, which is quite often observed during
> the communication of compute nodes and OSS nodes.
>
> Do you have any suggestion on this problem?

call you "super" engineers again and tell them to do their job!

this is definitely a problem of the machine and its configuration.
i find it pretty hilarious that the system managers tell you that
they see this error happen and imply that it is a failure of your
application. NAMD is being using on lustre file systems at
a very large scale (NCSA's abe and lincoln cluster, NICS'
cray xt5 and others) successfully.

and NAMD is not really putting a large strain on the I/O
subsystem. other programs should create much worse issues.

cheers,
    axel..

> Cheers.
> Shen.
>
>
>
> ________________________________
> 网易为中小企业免费提供企业邮箱(自主域名)

-- 
Dr. Axel Kohlmeyer    akohlmey_at_gmail.com
http://sites.google.com/site/akohlmey/
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:47 CST