From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Fri May 16 2008 - 10:49:46 CDT
On Thu, 15 May 2008, Alexandre A. Vakhrouchev wrote:
AV> Hi all!
AV>
AV> During 2000K atoms system simulation I got following error message:
AV>
AV> namd2: Rank 0:29: MPI_Iprobe: ibv_poll_cq(): bad status 12
AV> namd2: Rank 0:28: MPI_Iprobe: ibv_poll_cq(): bad status 12
AV> namd2: Rank 0:28: MPI_Iprobe: self node9.eth0.mvs50k.jscc.ru peer
AV> node26.eth0.mvs50k.jscc.ru (rank: 115)
AV> namd2: Rank 0:28: MPI_Iprobe: error message: transport retry exceeded error
AV> namd2: Rank 0:29: MPI_Iprobe: self node9.eth0.mvs50k.jscc.ru peer
AV> node26.eth0.mvs50k.jscc.ru (rank: 117)
AV> namd2: Rank 0:29: MPI_Iprobe: error message: transport retry exceeded error
AV> namd2: Rank 0:28: MPI_Iprobe: Internal MPI error
AV> namd2: Rank 0:29: MPI_Iprobe: Internal MPI error
AV> MPI Application rank 29 exited before MPI_Finalize() with status 16
AV>
AV> NAMD runs for different number of steps before getting this error and
AV> fail. Sometimes it hangs at startup phase. Other simulations work good
AV> for less number of atoms (I tried up to 200K atoms). Cluster is dual
AV> quadcore Xeon on Infiniband. I built NAMD for Linux-amd64-MPI
AV> according to Wiki
AV> http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdOnInfiniBand
AV> Francly speaking the only thing I didn't use was "-thread pthreads"
AV> for CHARMOPTS because linker exited with error that there was no
AV> pthreads found. May it be the case?
no. you are overloading your infiniband fabric.
sometimes using less cores/node helps. you should
contact the sysadmins of the machine and tell them
that they are in for some "fun". ;-)
i've seen this happen on several large infiniband
based clusters and it is not easy to work around.
cheers,
axel.
AV>
AV>
-- ======================================================================= Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu Center for Molecular Modeling -- University of Pennsylvania Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323 tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425 ======================================================================= If you make something idiot-proof, the universe creates a better idiot.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:28 CST