Multiple replica WTM-eABF simulations interrupted during replica sharing step

From: Abhishek Acharya (abhi117acharya_at_gmail.com)
Date: Tue Jan 26 2021 - 04:48:33 CST

Hi all,

I am using NAMD 2.13 and Intel MPI 19.0.5.

One of my simulations have stopped with the following error:

colvars: shared ABF: Sharing gradient and samples among replicas at step
10546000
colvars: shared ABF: Sharing gradient and samples among replicas at step
10546000
colvars: shared ABF: Sharing gradient and samples among replicas at step
10546000
colvars: shared ABF: Sharing gradient and samples among replicas at step
10547000
colvars: shared ABF: Sharing gradient and samples among replicas at step
10547000
colvars: shared ABF: Sharing gradient and samples among replicas at step
10547000
bcn1855.437302namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437304namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437298namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437306namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437310namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437251namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437309namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437297namd2: Reading from remote process' memory failed. Disabling
CMA support
bcn1855.437307namd2: Reading from remote process' memory failed. Disabling
CMA support

The run directory has a lot of files printed with the following backtrace:

namd2:298454 terminated with signal 11 at PC=d2122f SP=7fffffff7f40.
Backtrace:
namd2(_ZN12PmeRealSpace12fill_chargesEPPfS1_RiS2_PcS3_P11PmeParticle+0x193f)[0xd2122f]
namd2(_ZN10ComputePme6doWorkEv+0x163f)[0xa88faf]
namd2(_ZN19CkIndex_WorkDistrib29_call_enqueuePme_LocalWorkMsgEPvS0_+0xe)[0xdf321e]
namd2(CkDeliverMessageFree+0x22)[0x112f9f2]
namd2(_Z15_processHandlerPvP11CkCoreState+0x7b6)[0x1128846]
namd2(CsdScheduleForever+0x7d)[0x1245b6d]
namd2(CsdScheduler+0x1e)[0x1245aee]
namd2(_ZN7BackEnd4initEiPPc+0x3b9)[0x758ff9]
namd2(main+0xac)[0x74dccc]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaabfa1555]
namd2[0x6a2529]

Has anyone faced this issue before? I am able to continue the simulations
though.

Sincerely,
Abhi

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST